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ABSTRACT 


This thesis is an application of logistic regression and survival analysis techniques 
to the study of current estimated potential (CEP), manpower performance, and attrition 
behaviour in the Singapore military. The manpower data includes both active (30%) 
and reserve personnel (70%) who entered service as early as the late fifties to as recent 
as the year 1992. The covariates under consideration are education level, academic or 
overseas military training award, current rank, length of service, rank seniority, age, 
salary grade, previous year’s annual performance grade and CEP estimates. 

The study identifies the covariates that explain the CEP and annual performance 
for the binary and polytomous models of the officers who were still on active duty as 
of 31 Dec 1992. It also examines the trend of attrition behaviour of officers using data 
from both the active and reserve personnel. 

The results of the study show that (1)higher education level does not necessary 
result in better performance grade although it seems to give an indication of higher 
CEP, (2)The higher the rank of an officer, the more likely it is for him to have a poorer 
performance grade than when he was in the previous rank, (3)Education level is a 


significant covariate of the survival functions, and (4)Engineering officers generally has 
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a higher attrition rate than the other service support officers. 
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EXECUTIVE SUMMARY 


Manpower planners and recruitment agencies in the Singapore’s DOD are keen 
to identify the various explanatory variables that could be used to explain current 
estimated potential (CEP - an officer’s estimate of his command capacity by 45 years 
of age) and performance. The Government’s advocacy for family planning in the 
seventies has resulted in a reduction of eligible males who could be recruited for a 
military career in the nineties. If the current attrition of military officers is not properly 
checked, then at the turn of the century the military would have a mammoth task in 
keeping up with its operational manning requirements. Identifying the significant 
covariates and trends of attrition would greatly assist the responsible agencies in force 
planning and formulation of manpower policies. 

In view of the above, two techniques are employed in this thesis. First, the 
logistic regression technique is used to identify the significant covariates that could 
explain and predict CEP and performance grade. Two models are considered, namely, 
the binary and polytomous logistic regression. The covariates under consideration are 
education level, academic or overseas military training award, current rank, length of 
service, rank seniority, age, salary grade, previous year’s annual performance grade and 
CEP estimates. 

Second, the survival analysis technique is used to analyze the trend of the attrition 
behaviour of officers who entered service during the period 1965-70, 1971-76, and 


1977-82. The graphical approach is used to examine the attrition trends which does not 








require any statistics background. However, formal statistical tests are conducted to 
ascertain the visual obs:: vations. 

For the CE binary response model, the data is divided into two groups. The first 
group consists of officers who have a CEP rank of Major and below while the second 
croup consists of officers who have a CEP rank of at least a Lieutenant Colonel. A 
standard measure of the quality of model prediction using a cutoff point of 0.40 resulted 
in approximately 87% correct classification for each group. As for the performance 
binary response model, the data is also divided into two groups. The first group 
consists of officers who have a performance grade of B minus and below in the 1992 
performance appraisal. The second group consists of officers who have a performance 
grade of at least a B in the 1992 performance appraisal. A cutoff point of 0.64 would 
result in each group being approximately 74% correctly classified. 

The CEP polytomous response model has an 82% correct prediction capability 
when the fitted model is tested on a second population of officers as compared to 68% 
for the performance model. 

The significant findings are outlined below. 

¢ Education Level- Education level is not a significant predictor of performance 
though a higher education level seems to give an indication of higher CEP. 

¢ Training Award- There is insufficient evidence to support the notion that officers 
given an academic or overseas military training award tends to have a better 
performance grade than those who did not receive any. 


¢ Rank- The higher the rank of an officer, the more likely it is for him to get a 
poorer performance grade than when he was in the previous rank. 





Previous year’s CEP and Performance Grade- Current year’s CEP estimation 
and performance grade prediction are highly correlated to previous year’s CEP 
and performance grade. 


The results of the survival analysis are briefly outlined below. 


Non-Graduate vs Graduatz- The attrition behaviour in each of the three 
enlistment periods (officers who entered service during 1965-1970, 1971-1976, 
and 1977-1982) between non-graduates and graduates is not significantly 
different. 


Education Level- Education level has a strong relationship with the attrition 
behaviour of the officers. Officers with a Cambridge General Certificate of 
Education (GCE) ’O-’ or ’A-’ level qualification have consistently survived longer 
in the service than officers who have other educational qualifications. On the 
contrary, officers with diploma qualification exhibit the lowest survival functions. 


Training Award- The trend of the difference in the survival functions between 
non-award and award holders for the three enlistment periods is statistically the 
same. 


Support Vocation- The Engineering and Air Force support officers have the 
highest attrition rate during the first year of service. It drops to the lowest at the 
beginning of the third year, after which the attrition rates of the Engineering 
officers are generally higher than the other two categories of officers. The Army 
support officers exhibit a relatively constant attrition rate throughout the entire 
period of study. 


Service Group- For the first six years of service, the Naval officers have a lower 


risk of leaving the service than their Army counterparts. In contrast, after the first 
six years, the converse is true. 


xi 








I. INTRODUCTION 


A. BACKGROUND 

In manpower studies, much attention is given to job changes, layoffs, retirements, 
performance appraisal, and promotions. Very often performance appraisal and 
promotion go hand-in-hand. In Singapore’s military organization, staff performance 
appraisal is carried out annually. The military officers’ promotions are based on this 
annual assessment. 

The annual assessment consists of two parts. The first part assesses the officer’s 
aggregated annual performance appraisal which encompasses job performance, work 
attitudes and personal qualities. Job performance is being assessed through factors such 
as initiative, planning ability, applied knowledge, quality of work, and decision making. 
Work attitude is being assessed through factors like drive and determination, 
responsibility, and teamwork. Personal qualities is being assessed through factors like 
the officer’s writing ability, oral expression, stability in stressful situations, human 
relations, and last but not least leadership qualities. All these factors are given or a 
numeric scale with 1 being the highest possible and 7 being the lowest. All that is 
required of the reporting officer is to tick the box corresponding to the score to be 
awarded to that particular factor under consideration. Finally, the overall performance 


is an aggregate score based on the assessment of job performance, work attitudes, and 
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personal qualities. It is given on a numeric scale from | to 15 with 15 representing an 
A‘, 14 an A, 13 an A’, 12 a B’, 11 a B, and so on. 

The second part assesses the officer’s current estimated potential (CEP). The CEP 
measure is a military rank assessment. It is an estimate of the officer’s command 
capacity by 45 years of age (e.g. Rank: LTC, Appointment: Bn Comd/CO of Trg 
School). This assessment is independent of the above performance appraisal. Here, the 
officer is being assessed on his ability to approach a problem from a higher vantage 
point (known as the Helicopter Quality). This includes his ability to detect quickly and 
attend to relevant details within a broader context, and be constantly able to provide 
solutions of good vision. The officer is also assessed on his powers of analysis, 


imagination, and sense of reality when faced with complex and unfamiliar problems. 


B. PROBLEM STATEMENT 

Currently, not much work has been done in the area of annual CEP and 
performance prediction of combat officers. Manpower planners and recruitment 
agencies are keen to identify the various explanatory variables that could be used to 
explain CEP and performance. In many military organizations, education level has by 
far proved to be a valuable predictor of performance. Is education level a valuable 
predictor of performance in the Singapore context? Is education level also a good 
explanatory variable for CEP estimation? 

Some of the officers are awarded academic or overseas military training to 


increase their knowledge and professionalism during their careers as military officers. 





Will an officer who is given such an award perform significantly better than those that 
are not given any training award? 

Another area of interest asks whether there is any significant difference in 
performance and CEP among officers of differing vocations. 

Family planning in the seventies has drastically reduced the population of eligible 
males which could be recruited for a military career in the nineties. The military has 
to compete with the civilian organizations for this limited pool of resource. To alleviate 
the problem of manpower shortages, the military has to ensure that the attrition level 
of the officers is under control. A high attrition level will disrupt the efficiency and 
readiness of the military as a whole. It is also costly since new officers have to be 
recruited and time is needed to train them to a proficiency level compatible to their 
predecessors. Hence, the factors that affect the length of service of an officer is also 
of great interest to the military commanders, manpower planners and recruitment 
agencies. Identifying these factors could greatly assist the responsible agencies in force 


planning and formulation of manpower policies. 


C. THESIS OVERVIEW 


1. Objective 
This thesis examines the relationship between an officer’s covariates (past 
performance and CEP assessments, education level, training award, current rank, 
seniority in current rank, age, length of service) and (a) future CEP estimation, and (b) 


the prediction of future performance. It also investigates the attrition behaviour of 








officers who entered service during the period from 1965-70, 1971-76, and 1977-1982 


as a function of educational level, training award, support vocation and service type in 
general. 

The primary interest is to identify those covariates that could significantly 
explain an officer’s CEP assessment and performance appraisal. The secondary interest 
is to examine the attrition pattern of officers who entered service during the periods 


from 1965-70, 1971-76, and 1977-82. 


2. Methodology 

The study is basically divided into two parts. The first part uses the logistic 
regression technique to estimate CEP and predict performance. The simplest model is 
the binary response model. It is used to model dichotomous outcomes, as for example, 
whether an officer’s CEP estimate would be of Major (MAJ) rank and below, or 
Lieutenant Colonel (LTC) rank and above. In contrast, the polytomous response model 
is able to provide us with more information. The response is no longer restricted to two 
levels. In this thesis, the CEP model has four levels namely, CPT, MAJ, LTC, and, 
COL and above. The tradeoff for the polytomous response model is that the model is 
difficult to evaluate and explain to the novice. 

The second part of the study uses survival analysis techniques to compare 
the attrition patterns of officers who are enlisted in the three different periods. This 
thesis examines only the individual effects of each covariate, namely, education, training 


award, support vocation, and service group. 








3. Findings 

The binary and »olytomous response models both give the same significant 
covariates. For the CEP model, the significant covariates are education, current rank, 
rank seniority, age, and previous year’s annual performance grade and CEP. For the 
performance model, the significant covariates are current rank, rank seniority, and 
previous year’s annual performance grade and CEP. 

For the CEP response model, the findings indicate that it is more likely for 
a highly educated, young high-ranking officer to have a CEP estimate of at least a LTC 
rank. Additionally, the higher the previous year’s performance grade and CEP estimate, 
the higher the probability that the officer’s CEP is at least a LTC rank. 

For the annual performance response model, an interesting result is found. 
The higher the rank of an officer, the more likely it is for him to have a poorer 
performance grade than when he was in the previous rank. This could be a direct result 
of quotas placed on the performance grades. 

Education level is found to have a significant effect on the attrition 
behaviour of the officers for the three enlistment groups under study. Generally, the 
Engineering Support officers seems to have a higher risk of leaving the service than the 


Army and Air Force Support officers. 


4. Organization 
The organization of this thesis follows the order in which the study was 
performed. Chapter II describes the methodology of binary and polytomous logistic 


regression, and the survival analysis technique used in the thesis. Chapter III gives a 





summary of the exploratory analysis of the population under study. It also contains a 
brief description of the covariates and a code book. Chapter IV presents the binary 
models for future CEP estimation and performance prediction. Evaluation of the 
models developed are also discussed in details. Chapter V presents the polytomous 
models for future CEP estimation and performance prediction. Chapter VI contains 
analyses of single covariate effect on the attrition behaviour of officers enlisted during 
three different time periods. Chapter VII contains the conclusions and a summary of 


the findings, together with the recommendations for future work. 


Il. METHODOLOGY 


A. LOGISTIC REGRESSION 

Linear logistic regression is one of the many special cases of generalized linear 
models. It is characterized by three components: a random component, which identifies 
the probability distribution of the response variable; a systematic component, which 
specifies a linear function of explanatory variables that is used as a predictor; and a link 
function describing the functional relationship between the systematic component and 
the expected value of the random component. [Ref. I:p. 80] 

Linear logistic regression technique fits the model for binary or ordinal response 
data using the method of maximum likelihood. Logistic regression model has been in 
use in statistical analyses for many years. It is frequently used when an individual is 
to be classified into two or more groups. In the past, logistic regression found most of 
its application in the medical field [Ref. 2:p. vii]. It has been used, for example, to 
predict the survival of critically ill patients who are admitted to an intensive care unit 
as a function of certain physiological variables. Its application has expanded from 
health sciences to many other fields such as sociology, criminology, marketing and 
manpower studies. 


The fundamental assumption in linear logistic regression analysis is that natural 


logarithms of odds is linearly related to the independent covariates. Here, odds is 








defined as the ratio of the probability of an event occurrence to the probability of non- 
occurrence of the event. 

Variable selection is necessary when there are many candidate covariates for 
model building. Three commonly used methods are: forward selection, backward 
elimination, and stepwise selection. In this thesis, the stepwise variable selection 
procedure of the Statistical Analysis System (SAS) software package is used for 
variable selection. The stepwise method combines both the forward selection and 


backward elimination methods. [Ref. 3:p. 196] 


1. Binary Response Model 
In the binary response model, the response variable is binary or 
dichotomous. An individual can take on one of the two possible values, denoted for 
convenience by 0 and 1. Observations of this nature arise, for instance, an individual 
has either been promoted (Y=1) or has not (Y=0) in the annual staff promotion 


exercise. We may then define 
pr(Y=0) - 1%; pr(y=1) = (1) 


for the probabilities of ’failure’ (not promoted) and ’success’ (promoted) respectively. 
The probability of an officer’s promotion would be related to his characteristics such 
as annual performance grade and CEP. 


The goal of this analysis is to find the best fitting and most parsimonious 


yet practical and reasonable model to describe the relationship between the response 


variables (annual performance grade, and CEP) and a set cf independent explanatory 





variables. These independent variables are often known as covariates. The term 
“explanatory variable" will be used interchangeably with “covariate” throughout this 
thesis. 

A wide choice of link functions g(7) is available to describe the functional 
relationship between the probability distribution of the response variable and the linear 
function of explanatory variables [Ref. 4:p. 108]. Three functions commonly used in 
practice are: 

* the logit or logistic function 
,(m) = log{x/(1 - n)}; 
¢ the probit or inverse Normal function 
g,(m) = ' (x); and 
¢ the complementary log-log function 
g3(%) = log{-log(1 - 7)}. 
A fourth possibility, the log-log function 
84(7) = -log{-log(x)}, 
which is the natural counterpart of the complementary log-log function, is seldom used 
because its behaviour is inappropriate for n < '4, the region that is usually of interest. 
All four functions can be obtained as the inverse of well-known cumulative distribution 
functions having support on the entire real axis. The first two functions are 
symmetrical in the sense that 
g,(%) = -g,(1 - 2). 


The later two functions are not symmetrical in this sense, but are related via 





g3(%) = -g,(1 - 7). 
a. Advantages of Logistic Function 
The logistic function is used in this thesis because of its simple 
interpretation as the logarithm of the odds ratio, n/(1 - x). Apart from this, the logistic 
function has one important advantage over all alternative transformations in that it is 


eminently suited for the analysis of data collected retrospectively. [Ref. 4:p. 109} 


6. Parameter Interpretation 
If a linear logistic model is used with p covariates, then we would have 


the model 
log(=*_) = B,+B,x,+B,x, +... +BLx, . (2) 


for the log odds of a positive response (’success’ or say, promoted). Throughout this 
thesis, the term "log" refers to the "natural logarithm". Equivalently, in terms of the 


probability of belonging to a positive response, Equation (2) can be rewritten as 


__ exp (B,+B,x,+B,x,+- - - +BLxp) (3) 
1+exp (B,+B,x,+B,x,+.. . *B,x,) 


This is the inverse function of g,(a). Assuming that the covariates are functionally 
unrelated, the effect of a unit change in x, is to increase the log odds by an amount f,. 
In other words, we may say that a unit change in x, has the effect of increasing the 


odds of a positive response multiplicatively by the factor exp(B,). It is important that 





all the other covariates (i.e. x,,x;,...,.X,) are held fixed and not be permitted to vary as 


a consequence of the change in x,. [Ref. 4:p. 110] 


2. Polytomous Response Model 

If the response of an individual or item is restricted to one of a fixed set of 
possible values, we say that the response is polytomous. The binary response mode! 
is a special case of the polytomous response model. In the development of models for 
polytomous response variable, we need to know its underlying measurement scale. 
Many methods are available for modelling nominal scaled response variable 
(performance grade) but will not be discussed here [Ref. 2:p. 216]. In this thesis, 
methods for modelling ordinal scaled response variable (CEP) is presented. 

When response categories have a natural ordering, logit models should 
utilize that ordering. A familiar example of ordinal response category is the rating 


scales used in food testing and wine tasting. 


a. Cumulative Logit Model - Proportional Odds Model 
All the K-1 cumulative logits for a K-category response variable are 
incorporated into a single, parsimonious model. The simplest models in this class 


involve parallel regressions on the chosen scale, such as 


¥5 (x) ) = 6, - B7x, J=1,...,k-1, (4) 


19 aeystay) 


where y,(x) = pr(Y < j|x) is the cumulative probability up to and including category j, 


when the covariate vector is x. The negative sign in (4) is a convention ensuring that 








large values of B'x lead to an increase of probability in the higher numbered categories. 


Both 6 and f in (4) are treated as unknown, and 6 must satisfy 6,<9,<...<0, , [Ref. 4:p. 
153]. Model (4) is known as the proportional-odds model because the ratio of the odds 


of the event Y <j at x = x, and x = x, is 


¥;(*%,)/ (1 - ¥;(%)) 


ee es _ar = ) 5 
1; (%)7(1 - ¥;(%)) exp (-B*(x, - x,)) (5) 


which is independent of the choice of category (j). The odds ratio of cumulative 
probabilities in (5) is called a cumulative odds ratio. The log of the cumulative odds 
ratio is proportional to the distance between the values of the explanatory variables, 
with the same proportionality constant applying to each cutpoint. Its interpretation is 
that the odds of making response < j are exp[{-B"(x, - x,)] times higher at x = x, than 


at X = X;. 


B. SURVIVAL ANALYSIS 

Statistical methods for survival analysis have evolved largely from biomedical and 
epidemiologic studies of humans and animals. Survival analysis is often used to 
analyze data on the length of time it takes for a specific event to occur. Survival time 
can be broadly defined as the time to the occurrence of a given event of interest. This 
event can be the death of a person, animal, or insect; or the termination of employment. 

Survival data may include subjects in the study who have not experienced the 
event of interest at the end of the study or time of analysis. For instance, some patients 


may still be alive at the end of a study period. For these subjects, the exact survival 








times are unknown. These are called censored observations or censored times and can 
also occur when individuals are lost to follow-up, in that they fail to turn up for 
subsequent medical review after a period of study. It would be impractical to wait until 
every subject has died before conducting any analysis. This is an intrinsic characteristic 
of survival data. 

The attrition behaviour of military officers is analogous to what was described in 
the previous paragraph. The survival time of an officer is the length of service time 
prior to leaving the service and becoming a reserve. The officers that are still active 


at the end of the study period are treated as censored observations. 


1. Survival Functions 

In this analysis, it is assumed that the survival time of an officer is discrete 
and represented by, t, (t=1,2,...,25), where t is the number of years of active service 
prior to going into reserve. The values of t are rounded to the next higher integer 
value. Therefore, if an officer went into reserve after serving 3.4 years of active duty, 
the survival time is 4 years. 

If there are no censored observations, the survival function is estimated as 
the proportion of officers surviving longer than t and is given by 


S(t) = P(an individual survives longer than t), where 


Be) Sas (see of officers with surviving time < t\ (6) 
total number of officers |. 
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When censored observations are present, the numerator of (6) cannot always be 


determined. Nonparametric methods of estimating S(t) for censored data have to be 


used instead. [Ref. 5:p. 86} 


2. Nonparametric Methods of Estimating Survival Functions 

Many authors use the term life-table estimates for the product-limit (PL) 
estimates. The only difference is that the PL estimate is based on individual survival 
times while in the life-table method survival times are grouped into intervals. The PL 
estimate can be considered as a special case of the life-table estimate where each 
interval contains only one observation. It is more convenient to perform life table 
analysis when the data have already been grouped into intervals or the sample size is 
huge, say in the thousands. 

The conditional proportion dying (q;) is defined as d,/n, for i = 1,...,s-1, and 
4, = 1, where d; is the number of individuals who die in the ith interval and ‘n, is the 
number of individuals who are exposed to risk in the ith interval. It is an estimate of 
the conditional probability of death in the ith interval given exposure to the risk of 
death in the ith interval. The estimate of cumulative proportion surviving (survival 


function) at t; is given by 


*The number of individuals entering the first interval fi, is the total sample size. For 
subsequent intervals, the number of individuals entering the th interval is equal to the 
number of individuals studied at the beginning of the previous interval minus those who 
are lost to follow-up, are withdrawn alive, or have died in the previous interval. 
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Ste) = [7 (G+ 6)" .< (7) 





This estimate was derived by Kaplan and Meier (1958), and in practice is 


often referred to as the Kaplan-Meier estimate. [Ref 6] 


3. Hazard Function 


The hazard function for the ith interval, estimated at the midpoint, is 


d, 24; ; 
Rtey- sere a ; =1,...,S-1 , (8) 
b(n, - 2a,) Pa fh * Ba) . 
2 








where 





* b, is the width of each interval, 
¢ d; is number of individuals who died in the ith interval, 
* n, is number of individuals who are exposed to risk in the ith interval, 


¢ #, is conditional proportion surviving and is defined as p, = 1 - 4;, which is an 
estimate of the conditional probability of surviving in the ith interval, and 


* 4, is conditional proportion dying and is defined as the ratio of d, over n,. 


The above equation (8) is the number of deaths per unit time in the interval divided by 


the average number of survivors at the midpoint of the interval. The hazard function 





is also commonly known as the instantaneous failure rate. It is a measure of the risk 


of failure at a point in time during the aging process. 








Before proceeding to analyze the data using the various techniques introduced 
earlier, some mention of the data set is desirable. This is taken care of in the following 


chapter. 








Il. DATA OVERVIEW 

This chapter gives a brief description of the data set that consists of about 17000 
records of individual officer’s characteristics. This data set contains records of both 
Singapore’s active and reserve officers for the period from 1959 to 1992. 

A. POPULATION 

The models for CEP estimation and performance prediction consider both the 
male and female officers who were still in active duty on 31 Dec 1992. Since the 
female population is relatively small compared to the male counterparts, the study does 
not discriminate between the two sexes. Out of the total of about 17000 records, about 
30% of them are still active. 

Table 1 shows the distribution of actual CEP of the active officers from 1990 to 
1992. Table 2 shows the distribution of actual annual performance of the active officers 
for the same period. 

From the two tables, it can be observed that the percentages of individuals in each 
response category over the three years are more or less the same. Additional two-way 
tables of CEP and performance as a function of educational level, award, age group, 


length of service, and rank seniority are found in Appendix A. 











Table 1. CEP DISTRIBUTION FROM THE YEAR 1990 TO 1992 
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Table 2. PERFORMANCE DISTRIBUTION FROM THE YEAR 1990 TO 1992 


PERFORMANCE FREQUENCY (PERCENT) 
Morape [| 1990 [91992 


GRADE 



















e figures in brackets represent the numeric score given on the performance 
appraisal form. 








B. COVARIATES 
“here are altogether eight covariates considered in this study. Except for length 
of service, rank seniority and age which are continuous variables, all the remaining 


covariates are categorical. Here is a brief description of the covariates: 


¢ Education Level - The education level of the officers varies from the Cambridge 
General Certificate of Education (GCE) ’O’ level to Doctorate. About 86% of the 
active officers have at least a GCE °A’ level or diploma qualification. Thirty- 
three percent of the active officers have at least a graduate degree. 


¢ Acadeiic or Overseas Military Training Awards - About 30% of the officers 
received some form of academic or overseas military training awards. Overseas 
military training awards include Sandhurst (United Kingdom), West Point (United 
States), the Naval Academy (United States), to name a few. Academic training 
awards include both local and overseas universities. 


* Rank - ’Rank’ is the rank of an officer as of 31 Dec 1992. It ranges from the 
rank of Lieutenant to the rank of Major General. 


¢ Length of Service - The length of service (measured in years) is computed from 
the year an officer first enters the military service as a recruit to 1992. 


¢ Rank Seniority - Rank seniority is the number of years an officer has been in his 
most recent rank since last promotion. 


¢ Age - Age’ is the age of the officer. 


¢ Salary Grade - The salary grade ranges on an ascending scale of 1 to 10. A 
higher grade in each of the rank will mean higher renumeration for an officer. 








C. CODE BOOK 


A code book for the individual officer’s characteristics is given in Table 3. 


Table le 3. CODE BO CODE BOOK FOR _ FOR INDIVIDUAL OFF ICER’S CHARACTERISTICS _ 


VARIABLE UNITS SCALE COMMENTS 
Officers numbered sequentially 


0 = unknown 1 = GCE ’O’ or equiv. and below 
2 = GCE ’A’ or equiv. 3 = Diploma and Adv. Diploma 
4 = General Degree 5 = Honors Degree 

6 = Masters Degree 7 = Doctorate 


| nominal 

Kad [= 
= academic or military training award 

| ratio | Length of service as at 31 Dee 92 

| ratio | umber of years in the rank held since fst promotion 

| ace | yeas | ratio | Age as at31 Dec i992 

|_sco____| none | onaina!_| Salary grade in ascending order from 11010 | 


C89 to C92 Current Estimated Potential, 1989 to 1992 
1 = CPT 4 = Snr. MAJ 7 = COL 10 = Snr. BG 
2 = Snr. CPT 5 = LTC 8=Snr.COL 11 =MG 
3 = MAJ 6=Snr. LTC 9=BG 


P89 to P92 aa Performance Appraisal, 1989 to 1992 


The code book is used for cross-reference when one does not understand what the 


number(s) in the data set means. This is the most important document in the data 
preparation phase. Once the code book has been prepared we can proceed to analyze 
the data. The next two chapters analyze the data set using the Logistic Regression 


technique. 














IV. BINARY RESPONSE MODEL 


A. CURRENT ESTIMATED POTENTIAL 

The primary goal is to determine the covariates that can best explain the variation 
of CEP of an officer. The stepwise regression technique is used for variable selection. 
The significance levels for entry and staying in the model are set at a = 0.10 and 0.12 
respectively. 

The response variable is the CEP for the year 1992 (denoted by CEP92). A 
response value of zero (Y=0) means a CEP estimate of MAJ° and below while a 
response value of one (Y=1) means a CEP estimate of LTC and above. This 
classification is chosen because the population under study can be approximately 
divided equally into these two groups (see Table 1 on page 18). In the process of 
model building three sets of candidate covariate combinations will be thoroughly 
investigated. They are 

¢ Education level, training award, rank, length of service, rank seniority, age, salary 
aa CEP grades from 1989 to 1991, and performance grades from 1989 to 


¢ Education level, training award, rank, length of service, rank seniority, age, salary 
grade, CEP for the year 1991, and performance grade for the year 1991, and 


¢ Education level, training award, rank, length of service, rank seniority, age, and 
salary grade. 





A comparison of the models derived from the above three covariate combinations 
is given in detail and is presented in Section A of Appendix B. In this analysis, an 
event occurs when an officer is classified as having a CEP estimate of MAJ” and below 
and a non-event when the officer have a CEP estimate of LTC and above. For 
convenience, the MAJ and below group is designated by MAJ, and LTC and above 
group by LTC. This convention will be adopted throughout this thesis. 


The probability of being classified as MAJ is estimated by 


p,. = —expl6.396-0.19E-2.26R-0.215+0.22A-0.18 (P91) -1.45(C91)] 
MAJ “ ‘T+exp [6.396-0.19E~-2.26R-0.215S+0.22A-0.18 (P91) -1.45(C91)]} © 





Conversely, the probability of being classified as LTC is estimated by 


1 
P = ’ 
LTc “1+exp [6 .396-0.19E-2.26R-0.21S+0.22A-0.18 (P91) -1.45 (C91) } 





where 
¢ E is educational level, 
¢ R is current rank as at 31 Dec 1992, 
¢ S is number of years in current rank since last promotion, 
e A is age (in years) as at 31 Dec 1992, 
¢ P91 is performance appraisal for the year 1991, and 


* (C91 is current estimated potential for the year 1991. 


A unit change in the educational level has the effect of increasing the odds of 
being classified as MAJ multiplicatively by a factor of 0.82. In other words, the higher 


the educational level of an officer, the more likely he or she would belong to LTC. 
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Similarly, the higher the rank, rank seniority, performance grade and CEP in the 
previous year, the higher the probability that an officer would belong to the LTC group. 
On the contrary, a unit increase in age has the effect of increasing the probability of an 


officer belonging to MAJ group. 


B. PERFORMANCE 

For this model, the response variable is the performance grade for the year 1992 
(denoted by PERF92). A response value of zero (Y=0) means a performance grade of 
B minus and below while a response value of one (Y=1) means a performance grade 
of at least a B. Like the CEP model, the same three covariate combinations are 
investigated. Again, for convenience, a response value of zero is designated as Group 
I while a response value of one is designated as Group II. 

Coincidentally, the model selected is again derived from the second covariate 
combination. A comparison of the models derived from the three covariate 
combinations are discussed in Section B of Appendix B. 


The probability of being classified as Group I is estimated by 


_ exp([(7.1631+1.1R-0.28S-0.44 (P91) -0.79 (C91) ] 


1+exp ([7.1631+1.1R-0.28S-0.44 (P91) -0.79 (C91) ] 


Conversely, the probability of being classified as Group II is estimated by 


1 


a a 
iI itexp (7.1631+1.1R-0.285-0.44 (P91) -0.79 (C91) ] 


’ 
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where 


¢ R is current rank as at 31 Dec 1992, 
¢ S is number of years in current rank since last promotion, 
¢ P91 is performance appraisal for the year 1991, and 


¢ (C91 is current estimated potential for the year 1991. 


An interesting result is that a unit change in the rank of the officer to the next 
level will increase the odds of getting a performance grade of B minus and below 
multiplicatively by a factor of three. In other words, as an officer gets promoted to the 
next rank, the more likely his annual performance grade will deteriorate when compared 
with those in his previous rank. The remaining three covariates in the model, however, 


have the reverse effect. 


C. EVALUATION OF THE MODEL 

In a statistical model building analysis, it is in the interest of the investigator to 
know how much to trust the predictions derived from the model. The question 
commonly asked: Can the model predict correctly a high proportion of the time? 
Statistical significance does not necessarily mean that the mode] will predict very well 
since these measures are based on the model. Very often, results obtained that are 
statistically significant do not predict very well when implemented in the real world. 

Equation (3) on Page 10 is the linear logistic model given in terms of the 
probability of belonging to a positive response (i.e., an event). In order to classify the 


officers into the two groups, a cutoff point must be determined, usually by graphical 
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means. This cutoff point is a probability ranging between 0 and 1, and is usually 
denoted by P.. The cutoff point is chosen so that a high percentage of correct 
prediction is achieved for the two groups. An officer would be classified as MAJ group 
(for the performance model: performance grade of B minus and below) if the 
probability of an event is greater than or equal to P,. The classification table in the 
SAS output (see Appendix C) provides information on sensitivity’, specificity’, false 


positive rate’ and false negative rate*’. 


1. Current Estimated Potential 

Naturally, one would wish the percent correctly classified in each group to 
be as close to one as much as possible. Figure 1 gives the graphical representation of 
the prediction of percent correct plotted against the cutoff point. For example, for a 
cutoff point of about 0.40, each group is approximately 87% correctly classified. This 
may be a good choice of a cutoff point because it treats both groups equally. In 
contrast, a cutoff point of 0.04 would result in 99% of the MAJ group classified 
correctly but only about 29% of the LTC group. 

The receiver operating characteristic (ROC) curve is a plot of the proportion 
of events (MAJ group) correctly classified as event (MAJ group) against the proportion 


of non-events (LTC group) incorrectly classified as event (MAJ group). Similarly, we 


*Sensitivity is the proportion of event that were predicted to be event. 
+Specificity is the proportion of non-event that were predicted to be non-event. 
**False positive rate is the proportion of predicted event responses that were observed as non-event. 


++False negative rate is the proportion of predicted non-event responses that were observed as event. 
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Figure 1: Percentage of Individuals with their CEP Correctly Classified (Binary Response Model). 


could also plot the proportion of non-events correctly classified as non-event against 
the proportion of events incorrectly classified as non-events. Figure 2 gives these two 
ROC curves. In the top plot of Figure 2, the top curve represents the actual curve 
obtained from the prediction of an event based on the six variables obtained from the 
stepwise selection procedure (i.e., education level, rank, rank scniority, age, previous 
year perfusmance grade and CEP estimate). The hypothetical curve (straight line) 
represents the chance-alone assignment (i.e., flipping of a fair coin). Likewise, the top 
curve of the bottom plot in Figure 2 represents the actual curve obtained from the 


prediction of an officer being classified as LTC. 
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From the plots in Figure 2, one can see that the model derived gives pretty 
good prediction. Ifa cutoff point of 0.4 is used, 87% of both groups could be correctly 
classified with a false positive rate of /6% and a false negative rate of //%. In other 
words 16% of the LTC group would be incorrectly classified as MAJ group as opposed 


to 11% of the MAJ group being incorrectly classified as LTC group. 
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Figure 2: ROC Curves for CEP Binary Response Model. 


2. Performance 
For this model, Group I refers to officers who have a performance grade of 
B minus and below while Group II refers to those with a performance grade of at least 


a B. As can be seen from Figure 3, a cutoff point of about 0.64 would result in each 


27 





group being approximately 74% correctly classified. On the contrary, a cutoff point of 
0.2 would result in 98% of Group I classified correctly but only about 28% for Group 
Il. Too high a cutoff point, for instance, a 0.8 cutoff value, would result in about 45% 
of Group I classified correctly but about 9/% for Group II. Hence, proper choice of 
the cutoff value should be exercised so that each group would have a high percent of 
correct classification. 


PERCENTAGE OF INDIVIDUALS WITH THEIR PERFORMANCE 
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Figure 3: Percentage of Individuals with their Performance Correctly Classified (Binary Response Model). 


From Figure 4, one can see clearly that the model derived does not give as 


good a prediction as the CEP model. A cutoff point of 0.64 would give about three 
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quarters of both groups being correctly classified with a corresponding false positive 
rate of 18% and a false negative rate of 36%. In other words, the percentage of 
individuals in Group I being incorrectly classified as Group II is twice that of Group 


II] individuals being incorrectly classified as Group I. 
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Figure 4: ROC Curves for Performance Binary Response Model. 


The binary response model is the simplest model of the Linear Logistic 
Regression technique. In the following chapter, we will use a polytomous response 


model to consider response variables having more than two levels. 
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V. POLYTOMOUS RESPONSE MODEL 

Valuable information are lost when the binary response models are used to model 
response variable having more than two levels. The numerous levels of the response 
variable (CEP and performance) are collapsed into two levels which are mutually 
exclusive. The power of the binary response model is realized when the response 
variable has two levels, as for example, officers being promoted or not promoted. 
Hence, for the CEP and performance models, it is essential to develop polytomous 
response models if more efficient discrimination of the officers is desired. 

The candidate covariates considered in the model building are education level, 
training award, current rank, length of service, rank seniority, age, salary grade, 
previous year’s (1991) annual performance grade and CEP estimate. The stepwise 
regression technique is again employed for variable selection. The significance levels 
for entry and staying in the mode] are set at a = 0.10 and 0.12 respectively. The 


cumulative logit model in SAS is used and it has the form 


Yj (x) 


1 Peele See 


) = 0; + BXx, Jakgaeugked: 5 (13) 


where y,(x) = pr(Y < j|x) is the cumulative probability up to and including category j, 
when the covariate vector is x. Refering to (4) on Page 11, the sign of B"x is opposite 


to that of (13) above. Hence, the signs of the parameter estimates obtained from the 
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SAS logistic procedure (using the cumulative logit model) must be reversed when (4) 


is used. 


A. CURRENT ESTIMATED POTENTIAL 

We will look first at Current Estimated Potential. The response variable is the 
CEP for the year 1992 and it has four levels - CPT, MAJ, LTC, and COL and above 
which are denoted by 1, 2, 3, and 4 respectively in the SAS program (see Appendix C, 
Section B). The resulting parameter estimates from SAS are given in Table 4 on the 
following page. 

It is interesting to note that the set of covariates that entered the polytomous 
response model is the same as that for the binary response model. Further, the sign of 
the Bs in the two models are the same. 

The results show that as education level, current rank, rank seniority, previous 
year’s annual performance grade and CEP estimate get higher, there is a tendency 
towards the higher-numbered categories. This means that it is more likely for the 


officer to have a high CEP estimate. Age, however, has the reverse effect. 


B. PERFORMANCE 
In the study of performance, the response variable is the annual performance 


grade for the year 1992. The original 15 levels (E,E,...,A,A‘) are collapsed to five 


levels representing A, B, C, D, and E grades (e.g., A,A, and A’ are collapsed to form 


A, and so on). The SAS program can de found in Appendix C, Section B. The 


parameter estimates given by SAS are presented in Table 5. 
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Table 4. PARAMETER ESTIMATES FOR THE CEP MODEL 






peer | 






| Parameter Standard Pr > 
Variable Estimate Error Chi-Square 
INTERCEP!1 -0.2548 | 10019 0.7992 
1 INTERCEP2 4.1478 1.0083 0.0001 
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0.0001 
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INTERCEP3 9.7599 1.0848 0.0001 






———— 


As in the case of the CEP study, the set of significant covariates that entered the 
polytomous response model is the same as that for the binary response model, but, of 
course the estimates are different for each model. Both the polytomous and binary 
response models give consistent results pertaining to the interpretation of the fs. 

The results show that the more the number of years an officer remains in a 


particular rank and the higher the previous year’s annual performance grade and CEP 


"EDU is education level 

?RANK is current rank 

3RSNR is rank seniority 

“Age is age of officer 

‘P91 is annual performance grade in the previous year (1991) 


°C91 is CEP estimate in the previous year (1991) 
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estimate, the more likely it is for him to receive a high performance grade during the 
current year assessment. However, as an officer gets promoted to the next rank, there 
is a tendency for him to receive a poorer annual performance grade when compared to 
the grades he received before promotion. This could be a direct consequence for having 


quotas in the performance grades. 


Table 5S. PARAMETER ESTIMATE FOR THE PERFORMANCE MODEL 


Parameter Standard Pr > 
Variable Estimate Error Chi-Square 


INTERCEP1 1.2353 0.4346 0.0045 





| 





INTERCEP2 2.0194 0.4213 0.0001 
INTERCEP3 5.7756 0.4677 0.0001 


INTERCEP4 
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C. EVALUATION OF MODEL 

It is useful to evaluate the models. To do this, the population is divided into two 
groups. The first group (Population I), is used for estimating the parameters while the 
second group (Population II) is used to assess the prediction quality of the model 


developed. 











For an ordinal response, the LOGISTIC procedure in SAS performs a test of the 


parallel lines assumption. In the output, this test is labeled "Score Test for the 
Proportional Odds Assumption” when the logistic link function is selected. The null 
hypothesis is that the slope parameters are the same, against the alternative hypothesis 


that at least one pair of slope parameters are not the same. 


1. Current Estimated Potential 

The chi-square score from the statistical test for testing the proportional odds 
assumption, is 133.1061, which is significant with respect to a chi-square distribution 
with 12 degrees of freedom (p=0.0001). This indicates that a proportional odds model 
may not be so appropriate for the data. However, results show that the model 
developed has a 78 percent correct prediction capability. When the model is tested on 
Population II, about 82 percent of the officers in the group were classified correctly. 
Considering the fact that the mode] now has more information about the response 
variable (four levels as opposed to two levels for the binary response model), this is a 


reasonably good prediction model. 


2. Performance 
In the study of performance, the chi-square score for testing the proportional 
odds assumption, is 125.2833, which is agzin significant with respect to a chi-square 


distribution with 12 degrees of freedom (p=0.000/). The model is capable of correctly 


classifying about 68 percent of the officers in both Population I and II. Not forgetting 











that the response variable now has five levels, this model could be considered as being 
reasonably good. 

In this and previous chapters, we have seen how the Logistic Regression 
technique may be used to estimate CEP and predict the performance grade of the 
officers. Next, we shall proceed to analyze the attrition behaviour of officers who 
entered service during the period from 1965-70 (denoted as the first cohort), 1971-76 


(denoted as the second cohort), and 1977-82 (denoted as the third cohort). 


VI. SURVIVAL ANALYSIS 

This chapter compares and analyzes the attrition patterns of officers who entered 
service during the period 1965-70, 1971-76, and 1977-1982. Those officers who 
entered service before 1965 are not considered because there are only about a dozen of 
them. On the other hand, officers who entered service after 1982 are not considered 
because the number of years that can be studied, analyzed and compared are less than 
half of that in the first cohort (i.e., those who entered service during 1965-1970). 

The attrition behaviour is analysed as a function of single covariate effect. The 
covariate effects considered are graduates against non-graduates, eduation (five levels), 
academic or overseas military training award against non-award holders, support 
vocations and service groups. 

The Singapore military has a very young history. The military is formed after 
Singapore became independent in 1965. During the first few years, there are very few 
naval officers and pilots. Almost all the officers are in the Army. Hence, for the 
support vocations and sevice groups effects the study does not distinguished the various 
cohorts. Rather, a global view of the entire population is taken. 

Graphical study of the survival functions is used for the comparative analyses. 
This approach gives a very good picture of how the various survival functions differ. 
The significance of the differences between survival functions are evaluated using 


formal statistical tests such as the Log-Rank and Wilcoxon test [Ref.5, Chap 5]. 
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A. NON-GRADUATE AGAINST GRADUATE 

The graduate group is defined as those officers who have attained at least an 
undergraduate degree. Survival functions for the three enlistmen: periods are shown in 
Figure 5. It is clear that there seems to be no significant difference between the non- 
graduate and graduate officers. The Log-Rank and Wilcoxon tests are both consistent 


with this visual observation. 
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Figure 5: Survival Curves for Non-Graduates and Graduates. 


B. EDUCATIONAL LEVEL 
Figure 6 shows survival curves for various education levels. The ’O-’ and ’A-’ 


levels represent officers who have a GCE ’O-’ and ’A-’ level respectively. ’Diploma 


represents officers who have only an Advanced or Basic Diploma education. 
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*Undergrad’ denotes officers who have an Undergraduate Degree. Postgrad’ denotes 


officers who have a Postgraduate Degree. 
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Figure 6: Survival Curves for Different Education Levels. 


It is interesting to observe from Figure 6 that officers with an ’O-’ or ’A-’ level 
education have consistently survived longer in service than the others for all the three 
cohorts. On the contrary, officers with diploma education show consistently the lowest 


survival function. For this group of officers, it can be seen that there is a sharp drop 





in the survival function for the first two to three years of service. After which it 
decreases more or less in a steady manner. The only exception is for Cohort 3 (1977- 
82) where there is again another sharp drop in the survival function after about nine to 
ten years in service. In terms of survival function, the officers with undergraduate 
degrees seem to rank below the ’O-’ and ’A-’ level officers but above those with 
postgraduate degrees. 

From the top plot of Figure 6 it appears that except for the officers with diploma 
and postgraduate education, the survival functions of the remaining groups of officers 
seem to be more a less the same. This suspicion is confirmed by examining the Log- 
Rank (p-value = 0.031) and Wilcoxon (p-value = 0.0108) tests for Cohort 1 (1965-70). 
Both of these tests give p-value of 0.000] for the other two cohorts indicating a strong 
significant difference in attrition behaviour among different education levels. The Log- 
Rank and Wilcoxon tests are recomputed without the officers with diploma 
qualification. It is found that for Cohort 1, education level is not a significant covariate 
at the 0.05 significance level. Here, the Log-Rank test p-value is 0.0805, and the 
Wilcoxon test p-value is 0.1109. For cohorts 2 (1971-76) and 3 (1977-82), however, 
education level is again found to be a significant covariate. 

From the foregoing disscusions it can be concluded that there is a significant 
difference in attrition behaviour between officers with diploma education and those with 
other educational qualifications. As for the other education levels (’0-’ and ’A-’ levels, 
’under-’ and ’post-’graduates) the survival function seems to indicate towards a strong 


significant difference among differing education levels. However, a note of caution is 
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that the attrition behaviour is a function of many other complex and uncontrollable 
factors such as civilian job market opportunities, the country’s economy, inflation, 
unemployment rates, etc. In other words, the trend of the survival functions should be 


viewed with caution. 


C. NON-AWARD AGAINST TRAINING AWARD HOLDERS 

Officers who are given academic or overseas military training awards are expected 
to survive longer in service than those who are not. One simple reason being officers 
given awards are required to sign an obligated service contract of between five to eight 
years, depending on the type of training award they received. If the officer breaks this 
contract, he would have to reimburse the Government the money invested in him. The 


survival functions are shown in Figure 7. 
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Figure 7: Survival Curves for Non-Award and Award Holders. 
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The difference in the survival functions between the two groups of officers are 
roughly the same for the three cohorts. This indicates a very strong consistency in the 
the attrition behaviour for the three cohorts. Figure 8 shows the plot of the difference 


in survival functions between this two groups of officers for the three cohorts. 
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Figure 8: Difference in Survival Funtions Between Non-Award and Award Holders. 


D. SUPPORT VOCATION 
This study includes Engineering officers, Army and Air Force support officers. 
The Engineering category consists of Ordnance, Electric, Naval and Air Engineering 


officers. The Army support consists of Signal, Artillery, Mechanical Transport, Armour 
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*Reccee’ and Armour Infantry officers. The Air Force support consists of Air Defence 
and Air Operations & Communication officers. 

As shown in Figure 9, the survival function of the Army support officers exhibits 
an almost linear trend which suggests a constant attrition rate. The survival functions 
of the Engineering and Air Force support officers could be pooled and described by a 
single two piece-wise linear functions since their attrition behaviours are roughly the 
same. For the first three years in service, both these two groups of officers show a very 
sharp drop in the survival function compared with that of the Army support officers. 
After the third year of service, the slopes of the survival functions for the three 
categories of officers are more a less the same. 

Figure 10 shows the hazard function estimates of the above three categories of 
officers. The attrition rate is the highest in the first year of service for the Engineering 
and Air Force support officers, and drops to the lowest at the beginning of the third 
year. After the third year the attrition rate of the Engineering officers is generally 
higher than the other two categories of officers. On the contrary, the Army support 


officers exhibit a relatively constant attrition rate throughout the entire period of study. 
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Figure 9: Survival Curves for Three Support Vocations. 
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Figure 10: Hazard Function Estimates for Three Support Vocations. 


E. SERVICE GROUPS 
The three groups of service under study are Infantry and Guards (Army), Pilots 
(Air Force), and Naval (Navy) officers. The pilots are either on the pensionable or 12 


years contract scheme. Therefore, it is not surprising to find that they have the best 





survival among the service groups (see Figure 11) and that their attrition rate begins to 


escalate only after 12 years of service (see Figure 12). 
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Figure 11: Survival Curves for Different Service Groups. 
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The highest attrition rate occurs at year six for both the Army and Naval officers 
because of their six years contract, as opposed to the pilots who have 12 years contract. 
For the first six years of service, the Naval officers have a lower risk of leaving the 
service than their Army counterparts. After the first six years of service, the converse 


is true. 


HAZARD FUNCTION ESTIMATES 


1-4 12 
LENGTH OF SERVICE IN YEARS 





Figure 12: Hazard Function Estimates for Service Groups. 


In this chapter, we have seen how the Survival Analysis technique may be used 
to analyze the attrition behaviours of the officers in the Singapore military. The 
following chapter gives the conclusions and summary of these and earlier findings, 


together with recommendations for future work. 
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VII. CONCLUSIONS 


A. LOGISTIC REGRESSION ANALYSIS 
The Logistic Regression technique is frequently used for analysis of data collected 
retrospectively. It is commonly used when an individual is to be classified into two or 
more categories. The amalgamation of response categories to two levels results in the 
lost of valuable information, and is discouraged if efficient discrimination of the 
response categories is desired. 
The significant results of the study on CEP estimation and performance prediction 
are briefly outlined below. 
¢ Education Level- Education level is not a significant predictor of performance 
though a higher education level seems to give an indication of higher CEP. 
¢ Training Award- There is insufficient evidence to support the notion that officers 
given an academic or overseas military training award tends to have a better 


performance grade than those who did not receive any. 


¢ Rank- The higher the rank of an officer, the more likely it is for him to get a 
poorer performance grade than when he was in the previous rank. 


¢ Previous year’s CEP and Performance Grade- Current year’s CEP estimation 


and performance grade prediction are highly correlated to previous year’s CEP 
and performance grade. 
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B. SURVIVAL ANALYSIS 

An intrinsic characteristic of survival data is the presence of censored 
observations. It would be impractical to wait until every subject has "died" before 
conducting any analysis. The life-table or product-limit estimate of the survival 
function is an invaluable tool to analyze the attrition behaviour when censored 
observations are present in the data set. 

The graphical approach of analyzing the survival function is a simple way of 
analyzing the problem without the requirement of a statistics background. Although 
some of the results are trivial, the analysis gives a clear insight on the attrition 
behaviour of the officers who entered service during the three enlistment periods (1965- 
70, 1971-76, and 1977-82). The results of the analysis are briefly outlined below. 

¢ Non-Graduate vs Graduate- For each of the three enlistment periods the attrition 
behaviour between non-graduates and graduates is not significantly different. 

¢ Education Level- Education level has a strong relationship with the attrition 
behaviour of the officers. Officers with an °O-’ or ’A-’ level qualification have 
consistently survived longer in the service than officers who have any other 
educational qualifications. On the contrary, officers with diploma qualification 
exhibit the lowest survival functions. 

¢ Training Award- The trend of the difference in the survival functions between 
non-award and award holders for the three enlistment groups is statistically the 
same. 

¢ Support Vocation- The Engineering and Air Force support officers have the 
highest attrition rate during the first year of service. It drops to the lowest at the 
beginning of the third year, after which the attrition rates of the Engineering 
officers are generally higher than the other two categories of officers. The Army 


support officers exhibit a relatively constant attrition rate throughout the entire 
period of study. 
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¢ Service Group- For the first six years of service, the Naval officers have a lower 
risk of leaving the service than their Army counterparts. In contrast, after the first 
six years period, the converse is true. 


C. RECOMMENDATIONS FOR FUTURE STUDY 

Data on the officer’s extra-curriculum activities during his schoo] days, marital 
status, number of children, and the Officer Cadet School’s graduation grade are some 
of the interesting covariates that could be investigated in future studies. 

Having analyzed the attrition behaviour of the officers the next step would be to 
predict the number of officers in each rank leaving the service based on Singapore’s 
economic indicators (e.g., unemployment rate, inflation, gross national product, etc.). 

Another interesting area to look at is to check whether there is any significant 
difference in performance and CEP among officers of different vocations. 

It is hoped that the models developed in this thesis and the insights they provide 


will be beneficial to manpower planners and recruitment agencies. 
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APPENDIX A: ADDITIONAL TWO-WAY TABLES OF CEP AND PERFORMANCE 


TABLE 6: TABLE OF EDUCATION LEVEL BY CEP FOR THE YEAR 1992 


EDUCATION 


H PERCENT 
: ROW PCT 
gS PCT 


Statistic 


Chi-Square 

Likelihood Ratio Chi-Square 
Mantel-Haenszel Chi-Square 
Phi Coefficient 

Contingency Coefficient 
Cramer’s V 


CEP 1992 


oe TOTAL 


17.70 
24.15 
51.49 


16.68 
62.50 
48.51 





Value 


715.763 

716.807 

608.738 
0.521 
0.462 
0.521 


0.45 73.31 
0.62 58 
10.08 0.00 


d 0.11 
15.20 0.43 
89.92 


0.000 
0.000 
0.000 


a 7: TABLE _OF TRAINING AWARD B TRAINING AWARD BY CEP FOR THE YEAR 1992 


| AWARD 
row 1 oe 


NON-AWARD 
HOLDER 


CEP 1992 
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Statistic DF Value Prob 


Chi-Square 4 417.398 0.000 
Likelihood Ratio Chi-Square 4 467.530 0.000 
Mantel-Haenszel Chi-Square I 29.695 0.000 

Phi Coefficient 0.398 

Contingency Coefficient 0.370 

Cramer’s V 0.398 


TABLE 8: TABLE OF LENGTH OF SERVICE BY CEP FOR THE YEAR 1992 


| LENGTH OF 
] SERVICE CEP 1992 





Statistic DF Value Prob 
Chi-Square 8 1192.703 0.000 
Likelihood Ratio Chi-Square 8 1351.217 0.000 
Mantel-Haenszel Chi-Square ] 934.707 0.000 
Phi Coefficient 0.672 
Contingency Coefficient 0.558 
Cramer’s V 0.475 
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Statistic 


Chi-Square 

Likelihood Ratio Chi-Square 
Mantel-Haenszel Chi-Square 
Phi Coefficient 

Contingency Coefficient 
Cramer’s V 


8 223.649 0.000 
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TABLE 10: TABLE OF AGE GROUP BY CEP FOR THE YEAR 1992 





AGE GROUP 
| 


PERCENT 
ROW PCT 
COL PCT 


a 



























26 TO s 30 


36 TO s 40 


41 TO s 45 


Statistic 


Chi-Square 

Likelihood Ratio Chi-Square 
Mantel-Haenszel Chi-Square 
Phi Coefficient 

Contingency Coefficient 
Cramer’s V 





CEP 1992 











Value Prob 
1208.219 0.000 
1314.311 0.000 
663.447 0.000 
0.677 
0.560 
0.338 
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TABLE A: TABLE OF EDUCATION LEVEL BY PERFORMANCE FOR THE YEAR 1992 





h EDUCATION PERFORMANCE 1992 


Re ee ee TOTAL 


27.75 2.69 27.18 14.94 0.76 7331 
37.85 3.67 37.07 20.37 1.03 
88.51 80.68 67.20 63.45 $7.14 













3.60 0.64 13.27 8.61 0.57 








13.49 2.41 49.72 32.24 2.13 
11.49 19.32 32.80 36.55 42.86 

Statistic DF Value Prob 
Chi-Square 4 156.071 0.000 
Likelihood Ratio Chi-Square 4 170.985 0.000 
Mantel-Haenszel Chi-Square 1 149.270 0.000 
Phi Coefficient 0.243 
Contingency Coefficient 0.236 
Cramer’s V 0.243 


TABLE 12: TABLE OF TRAINING AWARD BY PERFORMANCE FOR THE YEAR 1992 


NON-AWARD 


HOLDER 


AWARD 
HOLDER 





Statistic DE Value Prob 
Chi-Square 4 110.410 0.000 
Likelihood Ratio Chi-Square 4 112.829 0.000 
Mantel-Haenszel Chi-Square ] 54.901 0.000 

Phi Coefficient 0.205 

Contingency Coefficient 0.200 

Cramer’s V 0.205 


TABLE 13: TABLE OF LENGTH OF SERVICE BY PERFORMANCE FOR THE YEAR 1992 
LENGTH OF 
SERVICE PERFORMANCE 1992 

| 

} ROW PCT 

So _A__ | torar_— 


0.87 10.84 2.88 0.00 41.36 
2.11 26.21 6.97 
26.14 26.80 12.24 


| 770s 12 ; 18.04 11.22 
$2.77 32.82 
44.61 47.67 


11.56 
47.29 











Statistic DF Value Prob 
Chi-Square 8 1022.428 0.000 
Likelihood Ratio Chi-Square 8 1104.442 0.000 
Mantel-Haenszel Chi-Square ] 784.896 0.000 
Phi Coefficient 0.623 
Contingency Coefficient 0.529 
Cramer’s V 0.440 
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TABLE 14: TABLE OF RANK SENIORITY BY PERFORMANCE FOR THE YEAR 1992 


| RANK 
SENIORITY PERFORMANCE 1992 


PERCENT 
ROW PCT 
COL PCT 











Statistic DF Value Prob 
Chi-Square 8 497.816 0.000 
Likelihood Ratio Chi-Square 8 507.451 0.000 
Mantel-Haenszel Chi-Square 1 353.103 0.000 
Phi Coefficient 0.434 

Contingency Coefficient 0.398 

Cramer’s V 0.307 
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a 15: TABLE OF AGE GROUP BY PERFORMANCE FOR THE YEAR 1992 













AGE | AGE GrouP_| PERFORMANCE 1992 
PERCENT | 
ROW PCT ! 
TOTAL | 
eS nC RE CRE = I Te ATE A I A EE TP TS EF ET AE CET za 
26.16 0.72 1.36 0.08 37.00 

70.70 1.95 ie 3.69 0.20 

83.43 21.59 21.46 5.80 5.71 




















3.68 1.06 18.69 11.30 0.45 
10.45 3.02 53.13 32.11 1.29 
11.73 31.82 46.20 4799 34.29 
1.18 0.95 10.05 9.14 0.68 
$5.34 43) 45.69 41.55 3.10 | 
3.75 28.41 24.84 38.8! 51.43 
36 TO < 40 0.30 0.19 1.78 0.99 0.08 
9.09 5.68 53.41 29.55 2.27 
0.97 5.68 4.40 4.19 5.71 
41 TO < 45 0.04 0.34 1.02 0.76 2.20 
1.72 15.52 46.55 34.48 
0.12 10.23 2.53 3.22 
2 46 0.00 0.08 0.23 
0.00 25.00 75.00 





0.00 2.27 








Statistic DE Value Prob 
Chi-Square 20 1234.393 0.000 
Likelihood Ratio Chi-Square 20 1304.519 0.000 
Mantel-Haenszel Chi-Square ] 694.047 0.000 
Phi Coefficient 0.684 
Contingency Coefficient 0.565 
Cramer’s V 0.342 
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APPENDIX B: COMPARISON OF BINARY RESPONSE MODELS 

A sensitivity analysis is carried out to determine the various outcomes derived from the three sets 
of covariate combinations using the same data set. The variables that entered the model should be 
reasonable and practical besides being the best fitting covariates. 

Before proceeding further, it is necessary to discuss the various statistics that are used to assess 
the model fit. The Akaike Information Criterion (AIC) and Schwartz Criterion (SC) statistics under 
"Criteria for Assessing Model Fit" (see the example of SAS output in Appendix C) are primarily used 
for comparing different models for the same data. In general, when comparing models, lower values of 
these two statistics indicate a better model. (Ref. 7:p. 1088] 

The Score statistic gives a test for the joint significance of the explanatory variables in the model. 
This test considers only the independent variables, so no test is shown for the columns for “Intercept 
Only" and “Intercept and Covariates." The -2 LOG L row gives statistics and a test for the effects of 
the covariates based on -2 Log Likelihood (see Pages 65, 68, 71, 75, 78 and 8}). 

A. CURRENT ESTIMATED POTENTIAL 

The SAS outputs (Appendix C) for the three models indicate that the most desirable model for 
CEP estimation is Model 1 (AIC: 868.336; SC: 904.168), followed by Model 2 (AIC: 1124.762; SC: 
1162.575) and model 3 (AIC: 2028.528; SC: 2057.915). However, a closer look at the parameter 
estimates of Model ! shows evidence that multicollinearity may exist. The parameter estimates for 
performance grade for the previous one and two years are of different signs (P91: -0.1831. P90: 0.1456) 
indicating opposite effect for the same unit change in performance grade. This does not seem to make 
sense. Since the performance grades in the previous two years are likely to be highly intercorrelated, 
the computed estimates of the regression coefficients are unstable and their interpretation becomes 


tenuous. Hence, Model 2 is selected. 
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The reader would appreciate much better by referring to Figure 13 on the following page. The 
top graph gives the plot of senstivity against percent false positive rate for the three models under 
consideration. As seen, the three models are marginally different from each other since the three curves 
in the top plot are relatively close to each other. Although Model 3 outperforms marginally (for Percent 
False POS > 7) than the other two models for CEP prediction of the MAJ" group, it has much poorer 


prediction power for CEP of the LTC” group (see bottom graph of Figure 13). 


B. PERFORMANCE 

Once again, Model | proves to be the most statistically desirable model if one compares the AIC 
and SC statistics of the three models. However, why should performance depend on C9] and C839, but 
not C90? All these three variables measure the same characteristic (i.e., CEP but in three consecutive 
years). Although CEP estimation is supposed to be conducted independently from year to year, we 
cannot discount totally the fact that there may be some intercorrelation. Hence, Mode) 2 is selected 
instead. 

The top and bottom graphs in Figure 14 show the plots of sensitivity against percent false positive 
rate, and specificity against percent false negative rate respectively. Again, Model 3 outperforms the 
other two models for performance prediction of Group I”, but it is almost useless for prediction of Group 


II**, as seen by the large portion of the graph falling below the hypothetical curve. 


* Population with CEP of Senior MAJ and below 

** Population with CEP of LTC and above 

+ Population with performance grade of B minus and below 
++ Population with performance grade of B and above 
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Figure 13: Comparison of CEP Binary Response Models. 
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Figure 14: Comparison of Performance Binary Response Models. 





APPENDIX C: SAS PROGRAMS AND OUTPUTS 


A. BINARY RESPONSE MODEL 


1. Models for Current Estimated Potential 


MLOGREGI JOB CLASS=A,USER=S6599,PASSWORD=LEE 
HI*MAIN = LINES=(99) 

1 EXEC SAS 

MEXTFINI DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
NEXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/EXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HISYSIN DD * 





OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFIN1; 
INPUT 

@1 ID 

@12 DRANK 

@19 DOE 


e 

an 

om 

tT 

“ry 

4 
NNER KE NNNS 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@1 ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


NNNNSA 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@1 ID 
@8 P92 
@12 P91 
@16 P90 


NNN SA 
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@20 P89 2. 


> 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID, 


LGSVC = 92 - DOE; 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1 ; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE; SET OFFREC; 
IF (STATUS NE 1) THEN DELETE ; 


IF (C92 LT 5) THEN CEP92 =0; 
IF (C92 GE 5) THEN CEP92 = 1 ; 


TITLE "BINARY RESPONSE MODEL - CEP MODEL #1’ ; 
TITLE2 "EVENT=CEP OF MAJ AND BELOW NON-EVENT=CEP OF LTC AND ABOVE’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS!I COVOUT ; 
MODEL CEP92 = EDU AWARD RANK LGSVC RSNR AGE SGD 

P91 P90 P89 C91 C90 C89 

/ SELECTION=STEPWISE 

SLE=0.! 

SLS=0.12 

DETAILS 

CTABLE ; 


PROC PRINT DATA=BETAS] ; 
TITLE2 "PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 1’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS2 COVOUT ; 
TITLE "BINARY RESPONSE MODEL - CEP MODEL #2’ ; 
TITLE2 "EVENT=CEP OF MAJ AND BELOW NON-EVENT=CEP OF LTC AND ABOVE’ ; 
MODEL CEP92 = EDU AWARD RANK LGSVC RSNR AGE SGD P91 C91 
/ SELECTION=STEPWISE 

SLE=0.] 

SLS=0.12 

DETAILS 

CTABLE ; 


PROC PRINT DATA=BETAS2 ; 
TITLE2 "PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 2’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS3 COVOUT ; 


TITLE "BINARY RESPONSE MODEL - CEP MODEL #3’ ; 
TITLE2 *EVENT=CEP OF MAJ AND BELOW NON-EVENT=CEP OF LTC AND ABOVE’ ; 
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MODEL CEP92 = EDU AWARD RANK LGSVC RSNR AGE SGD 
/ SELECTION=STEPWISE 

SLE=0.1 

SLS=0.12 

DETAILS 

CTABLE ; 


PROC PRINT DATA=BETAS3 ; 
TITLE2 "PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 3’ ; 


2. Outputs for Current Estimated Potential Models 


a. Model I 


BINARY RESPONSE MODEL - CEP MODEL 
EVENT=CEP OF MAJ AND BELOW; NON-EVENT=CEP OF LTC AND ABOVE 


Criteria for Assessing Model Fit 


Intercept 
Intercept and 
Criterion Only Covariates Chi-Square for Covariates 
AIC 1684.415 868.336 
sc 1689.534 904.168 : 
-2LOGL  1682.415 854.336 828.080 with 6 DF (p=0.0001) 
Score ‘ 2 612.326 with 6 DF (p=0.0001) 


Analysis of Maximum Likelihood Estimates 


= [l= 
Variable Estimate Error Chi-Square Chi-Square Estimate 

[nrenceet | 1 | roaie7_| orees_| inizcae | ooo | | ss 
prank | 1 | 1490 | ses | cosa | oooes | -oserore | 0203 | 
a ee 
Cael ee ee 
Ca ee ee 
ra ee 
foo Tt |e 


Pisce [one [mmo | oom | sone [one 
0.1268 | 25.8877 Q.o001 | -0.452197 | oss | 
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Association of Predicted Probabilities and Observed Responses 


Concordant = 92.6% Somers’ D = 0.855 
Discordant = 7.0% Gamma = = 0.859 
Tied = 0.4% Tau-a = 0.418 
(372186 pairs) c = 0.928 


Residual Chi-Square = 11.8757 with 7 DF (p=0.1047) 
Analysis of Variables Not in the Model 


Score Pr > 
Variable Chi-Square Chi-Square 


EDU 0.3294 0.5660 

AWARD 0.001} 0.9738 
LGSVC 0.0082 0.9280 
RSNR 0.1090 0.7413 
AGE 1.5338 0.2155 

P89 0.9744 0.3236 
C89 2.3354 0.1265 


NOTE: No (additional) variables met the 0.1 significance level for entry into the model. 


Summary of Stepwise Procedure 


Number Score Wald 


Ee ee AT 0.0001 
po2 {| co | | 2 | soars || 0.000 
pos fc Ps 5910 fone 
a ee ee ee 
ge ee 
Pos frank PTs snes TO oes 
a 
ee ee eee 
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Classification Table 


Prob Non- Non- Sensi-  Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 





COCR S DOSE OTS CoRR eee eres eee Teron econeneeseseseweresecoeesorneeeccesesessecescecoce=se 


0.000 522 0 713 0 42.3 100.0 0.0 $7.7 0.0 


0.020 516 150 563 6 53.9 989 21.0 $2.2 38 

0.040 S15 199 514 7 57.8 98.7 27.9 50.0 3.4 

0.060 515 208 505 7 58.5 98.7 29.2 49.5 3.3 

0.080 515 248 465 7 61.8 98.7 348 474 2.7 

0.100 Si2 292 421 10 65.1 98.1 41.0 45.1 3.3 
0.120 495 382 331 27 71.0 948 53.6 40.1 66 
0.140 491 471 242° 31 719 94.1 66.1 33.0 62 
0.160 490 497 216 32 79.9 93.9 69.7 306 6.0 
0.180 488 519 194 34 81.5 93.5 72.8 28.4 6.1 

0.200 484 538 175 38 82.8 92.7 75.5 266 6.6 
0.220 480 551 162 42 83.5 92.0 77.3 25.2 7.1 

0.240 478 565 148 44 84.5 916 79.2 23.6 7.2 
0.260 474 569 144 48 84.5 90.8 79.8 23.3 7.8 
0.280 472 575 138 50 84.8 90.4 80.6 22.6 8.0 
0.300 470 589 124 52 85.7 90.0 82.6 20.9 8.1 

0.320 468 589 124 54 85.6 89.7 82.6 20.9 84 
0.340 462 604 109 60 86.3 88.5 84.7 19.1 9.0 
0.360 461 605 108 6! 86.3 88.3 84.9 19.0 92 
0.380 452 619 94 «2970 86.7 86.6 868 17.2 10.2 
0.400 448 619 94 74 86.4 85.8 86.8 17.3. 10.7 
0.420 447 627 86 = 75 87.0 85.6 87.9 16.1 10.7 
0.440 414 629 84 = 78 86.9 85.1 88.2 15.9 11.0 
0.460 441 631 8281 86.8 84.5 88.5 15.7 11.4 
0.480 435 633 80 §=87 86.5 83.3 88.8 15.5 12.1 

0.500 432 638 75 90 86.6 82.8 89.5 14.8 12.4 
0.520 426 640 73, (96 86.3 816 89.8 14.6 13.0 
0.540 414 647 66 108 85.9 79.3 90.7 13.8 14.3 
0.560 410 649 64 «112 85.7 78.5 91.0 13.5 14.7 
0.580 405 656 $7. 117 85.9 776 92.0 12.3. 15.1 
0.600 403 658 55 119 85.9 77.2 = 92.3 12.0 15.3 
0.620 401 660 53121 85.9 76.8 92.6 11.7 15.5 
0.640 398 662 Si 124 85.8 76.2 92.8 11.4 15.8 
0.660 382 667 46 140 84.9 73.2 93.5 10.7 17.3 
0.680 369 670 43 153 84.1 70.7 94.0 10.4 18.6 
0.700 355 675 38 = =167 83.4 68.0 94.7 9.7 19.8 

0.720 350 676 37 «172 83.1 670 94.8 9.6 203 

0.740 347 681 32-175 83.2 66.5 95.5 3.4 20.4 
0.760 344 683 30 «178 83.2 659 95.8 8.0 2.7 
0.780 340 683 30 9182 82.8 65.1 95.8 8.1 21.0 
0.800 329 684 29 =—«193 82.0 63.0 95.9 8.1 22.0 
0.820 316 686 27 = 206 81.1 60.5 96.2 7.9 23.1 


67 














Prob Non- Non- Sensi- Speci- False False 

Level Event Event Event Event Correct tivity ficity POS NEG 

0.840 304 689 24 «218 80.4 58.2 96.6 7.3 24.0 
0.860 292 693 20 230 79.8 559 972 64 249 
0.880 285 693 20 237 79.2 546 97.2 6.6 25.5 
0.900 261 696 17 261 77.5 50.0 97.46 6.1 27.3 
0.920 176 706 7 346 71.4 33.7 99.0 3.8 32.9 
0.940 145 709 4 377 69.1 278 99.4 2.7 34.7 
0.960 47 = 713 0 475 61.5 90 100.0 0.0 40.0 
0.980 33 = 713 0 489 60.4 6.3 100.0 0.0 40.7 
1.000 0 713 0 522 $7.7 0.0 100.0 0.0 42.3 

b. Model 2 


Criteria for Assessing Model Fit 


Intercept 
Intercept and 
Criterion Only Covariates Chi-Square for Covariates 
AIC 2252.754 = 1124.762 
sc 2258.156 1162.575 ; 
-2LOGL = 2250.754 ~—-1110.762 1139.992 with 6 DF (p=0.0001) 
Score ‘ : 787.643 with 6 DF (p=0.0001) 


Analysis of Maximum Likelihood Estimates 
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Association of Predicted Probabilities and Observed Responses 


Concordant = 93.3% Somers’ D = 0.867 

Discordant = 6.6% Gamma = 0.868 

Tied = 0.2% Tau-a = 0.428 

(662838 pairs) c = 0.933 
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Residual Chi-Square = 2.5633 with 3 DF (p=0.4640) 


Analysis of Variables Not in the Model 


Score Pr > 
Variable Chi-Square Chi-Square 


AWARD 1.3497 0.2453 
LGSVC 1.0153 0.3136 
SGD 0.3037 0.5816 


NOTE: No (additional) variables met the 0.1 significance level for entry into the model. 


Summary of Stepwise Procedure 


Score Wald | 
Chi-square | Chi-Square i | 
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Classification Table 


Correct Incorrect Percentages 
Prob Non- Non- Sensi- | Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 
0.000 726 0 913 44.3 100.0 0.0 55.7. 
0.020 720 214 699 57.0 99.2 23.4 493 2.7 
0.040 719 260 653 59.7 99.0 28.5 476 2.6 
0.060 717 297 616 619 98.8 32.55 462 2.9 
0.080 717 350 563 6S.1 98.8 383 440 2:5 


Oorxnaneo 








Correct Incorrect Percentages 


ewmeneererte «6 ewewernewene ween mene cee reescce eee ecewetowereenneecesescosesecescecceese 


Non- Non- Sensi- Speci- False False 
Event Event Event Event Correct tivity ficity POS NEG 


581 829 84 145 86.0 80.0 90.8 12.6 149 
571 829 84 155 85.4 78.7 908 12.8 15.8 
553841 72 =(173 85.1 76.2 92.1 WS 17.1 
552 841 72 «174 85.0 76.0 92.1 W517.) 


453 881 32 = 273 814 62.4 96.5 66 23.7 
442 886 27 = 284 81.0 609 97.0 5.8 243 
416 890 23-310 79.7 = 57.3 97.5 5.2 25.8 
395 893 20 = 331 78.6 54.4 97.8 48 27.0 
357-895 18 369 764 49.2 98.0 46 29.2 
313 902 1] 413 74.1 43.1 98.8 3.4 31.4 
154 908 5 572 64.8 21.2 99.5 3.1 38.6 
83 911 2 643 60.6 11.4 99.8 24 41.4 
58 913 0 668 59.2 8.0 100.0 0.0 42.3 
0 913 0 726 $5.7 0.0 100.0 443 








ro Model 3 


Criteria for Assessing Model Fit 


Intercept 
Intercept and 
Criterion Only Covariates Chi-Square for Covariates 
AIC 3528.592 2028.528 
SC 3534.470 2057.915 : 
-2LOGL = 3526.592 2018.528 1508.064 with 4 DF (p=0.0001) 
Score : : 1214.662 with 4 DF (p=0.0001) 


Analysis of Maximum Likelihood Estimates 





Standard 
Error 
















Parameter Wald Pr> Standardized Odds 
Variable DF Estimate Chi-Square Chi-Square Estimate Ratio 
30303 saor27_| 00001 | 20.704 


| 1 | o4a7o | ooses | asses 40303950 
Prank |_| 3708 | oss | 377260 -14s7eos | 0021 













0.2174 0.0275 62.3203 0.0001 0.610420 









Association of Predicted Probabilities and Observed Responses 


Concordant = 90.3% Somers’ D = 0.808 
Discordant = 9.5% Gamma = 0.810 
Tied = 0.3% Tau-a = 0.385 
(1654052 pairs) c = 0.904 


Residual Chi-Square = 4.8395 with 3 DF (p=0.1839) 
Analysis of Variables Not in the Model 


Score Pr > 
Variable Chi-Square Chi-Square 


AWARD 2.0350 0.1537 
LGSVC 1.7257 0.1890 
SGD 0.2818 0.5955 


NOTE: No (additional) variables met the 0.1 significance level for entry into the model. 





Summary of Stepwise Procedure 
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Classification Table 


Correct Incorrect Percentages 
Prob Non- Non- Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 
0 61.0 100.0 0.0 39.0 : 
0.020 1606 109 $19 3 65.0 99.8 10.6 36.4 2.7 
5 65.7 99.7 12.5 35.9 3.8 
6 66.7 99.6 15.3 35.2 3.7 
0.080 1601 181 847 8 67.6 99.5 17.6 34.6 42 
0.100 1599 239 789 «10 69.7 99.4 23.2 33.0 4.0 
0.120 °"94 300 728 = 15 718 99.1 29.2 314 48 
0.140 1585 36} 667 24 73.8 98.5 35.1 296 6.2 
0.160 1576 418 610 33 75.6 979 40.7 27.9 7.3 
0.180 1565 490 538 44 779 = 97.3 47.7 25.6 8.2 
0.200 1554 53) 497 55 79.) 96.6 51.7 242 9.4 
0.220 1542 559 469 67 79.7 958 54.4 23.3 10.7 
0.240 1532 592 436 = 76 80.6 95.3 57.6 22.1 11.4 
0.260 1518 611 417 91 80.7 943 59.4 21.6 13.0 
0.280 1509 629 399 100 81.1 93.8 61.2 20.9 13.7 
0.300 1500 644 384 109 81.3 93.2 62.6 204 14.5 
0.320 1494 660 368 86115 81.7 92.9 64.2 19.8 148 
0.340 1488 679 349 «121 822 92.5 66.1 19.0 15.1 
0.360 1478 699 329 131 826 91.9 68.0 18.2 15.8 
0.380 1468 712 316 =14) 82.7 91.2 69.3 17.7 16.5 
0.400 1460 720 308 149 82.7 90.7 70.0 17.4 47.1 
0.420 1453 740 288 = =156 85. 90.3 72.0 16.5 17.4 
0.440 1445 761 267 = 164 83.7 898 74.0 15.6 17.7 
0.460 1439 777 251 + =170 84.0 89.4 75.6 14.9 18.0 
0.480 1426 797 231 = «183 84.3 88.6 77.5 13.9 18.7 
9.500 1406 810 218 203 840 87.4 78.8 13.4 20.0 
0.520 1395 821 207. 214 84.0 86.7 79.9 12.9 20.7 
0.540 1384 833 195 225 84.1 86.0 81.0 12.3. 21.3 
0.560 1363 846 182 246 83.8 847 82.3 11.8 22.5 


72 





Percentages 





Deeeeeeswese = eeeweeresere come went nmemeceew ewes teeenewereeererecwscccceessoeceseesneee= 


False False 
ficity POS NEG 


resem nessewen erm mwwwewesccas ce nneneeooe seer et essen ese sores sccceseesereccccccecesesocscecesces= 


Sensi- 


83.8 
83.6 
83.2 
82.9 
82.7 
82.7 
82.4 
81.8 
81.8 
81.5 
81.1 
80.8 


84.0 
83.3 
81.7 
80.7 
79.9 
79.2 
78.6 
77.3 
76.9 
76.3 
75.3 
7146 
74.4 
73.6 


Correct Incorrect 
Prob Non- Non- 
Level Event Event Event Event Correct tivity 
0.580 1352 857 W711 257 
0.600 1340 865 163 269 
0.620 1315 880 148 294 
0.640 1298 887 141 311 
0.660 1285 895 133 324 
0.680 1274 907 121 = 335 
0.700 1264 909 119 = (345 
0.720 1243 914 114 366 
0.740 1238 918 110 0 371 
0.760 1227 922 106 382 
0.780 1211 928 100 398 
0.800 1201 929 99 408 
0.820 1197 930 98 412 
0.840 1185 931 97 424 
0.860 1167 933 95 442 
0.880 1114 948 80 495 
0.900 971 962 66 638 
0.920 716 987 41 893 
0.940 301 1017 11 1308 
0.960 58 1026 2 1551 
0.980 § 1027 1 1604 
1.000 0 1028 0 1609 

3. 


Models for Performance Appraisal 


Speci- 


/MLOGREG2 JOB CLASS=A,USER=S6599,PASSWORD=LEE 


H*MAIN_ LINES=(99) 


it EXEC SAS 


/fEXTFIN1 DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 


HEXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
HEXTFIN3 DD DISP=SHR,OSN=MSS.S6599.PERF.DATA 
HSYSIN DD * 


OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFIN!; 


INPUT 
@1 
@12 
@19 
@36 
@43 
@45 
@50 
@58 
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23.1 
23.7 
25.0 
26.0 
26.6 
27.0 
27.5 
28.6 
28.8 
29.3 
30.0 
30.5 
30.7 
31.3 
32.1 
34.3 
39.9 
47.5 
$6.3 
60.2 
61.0 
61.0 





@66 AGE 
@71 SGD 


DATA CEPREC: 
INFILE EXTFIN2; 
INPUT 
@1 ID 4 
@6 C92 2. 
@10 C92. 
@14 C92 
@1s cs9—( 2 


> 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@1 ID 4. 
@8 P92 2. 
@I2 P91 . 
@16 P90 2. 
@20 P89 2. 


. 
> 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 


LGSVC = 92 - DOE; 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1 ; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE; SET OFFREC; 
IF (STATUS NE 1) THEN DELETE ; 


IF (P92 LT 11) THEN PERF92 = 0 ; 
IF (P92 GE 11) THEN PERF92 = 1 ; 


TITLE ’BINARY RESPONSE MODEL - PERFORMANCE MODEL #1’; 
TITLE2 "EVENT=GRADE B MINUS AND BELOW NON-EVENT=GRADE B AND ABOVFP’; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS! COVOUT ; 
MODEL PERF92 = EDU AWARD RANK LGSVC RSNR AGE SGD 
P91 P90 P89 C91 C90 C89 
/ SELECTION=STEPWISE 
SLE=0.1 
SLS=0.12 
DETAILS 














CTABLE ; 


PROC PRINT DATA=BETAS! ; 
TITLE2 PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 1’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS2 COVOUT ; 
TITLE "BINARY RESPONSE MODEL - PERFORMANCE MODEL #2’; 
TITLE2 "EVENT=GRADE B MINUS AND BELOW NON-EVENT=GRADE B AND ABOVE’; 
MODEL PERF92 = EDU AWARD RANK LGSVC RSNR AGE SGD P91 C91 
/ SELECTION=STEPWISE 

SLE=0.1 

SLS=0.12 

DETAILS 

CTABLE ; 


PROC PRINT DATA=BETAS2 ; 
TITLE2 "PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 2’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS3 COVOUT ; 
TITLE "BINARY RESPONSE MODEL - PERFORMANCE MODEL #3’; 
TITLE2 "EVENT=GRADE B MINUS AND BELOW NON-EVENT=GRADE B AND ABOVE’; 
MODEL PERF92 = EDU AWARD RANK LGSVC RSNR AGE SGD 
/ SELECTION=STEPWISE 

SLE=0.1 

SLS=0.12 

DETAILS 

CTABLE ; 


PROC PRINT DATA=BETAS3 ; 
TITLE2 "PARAMETER ESTIMATES AND COVARIANCE MATRIX - MODEL 3’ ; 


4. Outputs for Performance Appraisal Models 
a. Model 1 
BINARY RESPONSE MODEL - PERFORMANCE MODEL 
EVENT=GRADE B MINUS AND BELOW; NON-EVENT=GRADE B AND ABOVE 


Criteria for Assessing Model Fit 


Intercept 
Intercept and 
Criterion Only Covariates Chi-Square for Covariates 
AIC 1680.554 1256.221 
SC 1685.673 1292.053 


-2LOG LL 1678.554 1242.221 436.333 with 6 DF (p=0.0001) 
Score : : 370.328 with 6 DF (p=0.0001) 








Analysis of Maximum Likelihood Estimates 
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Variable Error Chi-Square Chi-Square 
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Association of Predicted Probabilities and Observed Responses 














J. 


Concordant = 82.3% Somers’ D = 0.648 
Discordant = 17.4% Gamma = 0.650 
Tied = 0.3% Tau-a = 0.316 
(371004 pairs) c = 0.824 


Residual Chi-Square = 4.7942 with 7 DF (p=0.6851) 


Analysis of Variables Not in the Model 


Score Pr> 
Variable Chi-Square Chi-Square 


EDU 0.2783 0.5978 

LGSVC 0.0522 0.8192 
AGE 0.0254 0.8734 

SGD 0.1182 0.7310 
P90 1.3250 0.2497 
P89 0.0100 0.9202 
C90 1.5263 0.2167 


NOTE: No (additional) variables met the 0.1 significance level for entry into the model. 














Summary of Stepwise Procedure 


Score 
Chi-square 





Classification Table 


Correct Incorrect Percentages 
Prob Non- Non- Sensi- | Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 
0.020 719 0 516 58.2 100.0 0.0 41.8 
0.040 718 2 514 1 58.3 99.9 0.4 41.7 333 
0.060 718 5 51] 1 58.5 99.9 1.0 416 16.7 
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Criterion Only 


-2LOG L 2183.699 1611.567 


Correct Incorrect Percentages 
Prob Non- Non- Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 
0.440 626 310 206 93 75.8 87.1 60.1 24.8 23.1 
0.460 618 320 196 =—:101 76.0 86.0 62.0 24.1 240 
0.480 608 333 1830111 76.2 846 64.5 23.1 25.0 
0.500 596 346 170-123 76.3 82.9 67.1 22.2 26.2 
0.520 586 353 163133 76.0 81.5 68.4 21.8 27.4 
0.540 566 363 153153 75.2 78.7 70.3 21.3 29.7 
0.560 SSI 369 147 168 745 1766 71.5 21.1 31.3 
0.580 534 374 142 185 73.5 743 72.5 21.0 33.1 
0.600 526 382 134 ©6193 73.5 73.2 74.0 20.3 33.6 
0.620 517 388 128 202 73.3 71.9 75.2 19.8 34.2 
0.640 484 399 117-235 1.5 673 77.3 195 37.1 
0.660 466 409 107.253 709 648 79.3 18.7 38.2 
0.680 451 421 95 268 706 62.7 81.6 174 389 
0.700 431 431 85 288 69.8 599 83.5 16.5 40.1 
0.720 407 437 79 = 312 68.3 56.6 84.7 16.3 41.7 
0.740 372 453 63 347 66.8 51.7 87.8 14.5 43.4 
0.760 347 460 56 372 65.3 483 89.1 13.9 44.7 
0.780 329 476 40 390 652 458 92.2 10.8 45.0 
0.800 306 477 39413 63.4 42.6 92.4 11.3 46.4 
0.820 281 486 30 438 62.1 39.1 94.2 96 47.4 
0.840 254 488 28 465 60.1 35.3 94.6 99 488 
0.860 225 492 24 494 $8.1 31.3 95.3 9.6 50.1 
0.880 197 497 19 522 $6.2 27.4 96.3 8.8 51.2 
0.900 177 505 1) = 542 55.2 246 97.9 5.9 51.8 
0.920 135 509 7 584 $2.1 18.8 98.6 49 534 
0.940 113 513 3 606 50.7 15.7 99.4 2.6 54.2 
0.960 58 516 0 661 46.5 8.1 100.0 0.0 56.2 
0.980 23 516 0 696 43.6 3.2 100.0 0.0 57.4 
1.000 0 516 0 719 41.8 0.0 100.0 58.2 

b Model 2 

Criteria for Assessing Model Fit 
Intercept 
Intercept and 


Covariates Chi-Square for Covariates 


2185.699 1621.567 
2191.101 1648.576 ‘ 
$72.132 with 4 DF (p=0.0001) 
487.594 with 4 DF (p=0.0001) 
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Analysis of Maximum Likelihood Estimates 






a 
Parameter Standard Wald Pr> Standardized Odds 

Estimate Error Chi-Square Chi-Square Estimate Ratio 

Pt [rior | oaier | 301209 | ooo -—__} woe | 
a 0.1279 73.3601 0.0001 0.404822 | 
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Association of Predicted Probabilities and Observed Responses 


Concordant = 82.3% Somers’ D = 0.653 
Discordant = 17.0% Gamma = 0.657 
Tied = 0.6% Tau-a = 0.309 
(635670 pairs) c = 0.826 


Residual Chi-Square = 4.0442 with 5 DF (p=0.5431) 
Analysis of Variables Not in the Model 


Score Pr > 
Variable Chi-Square Chi-Square 


EDU 0.6176 0.4319 
AWARD 0.7092 0.3997 
LGSVC 0.5795 0.4465 
AGE 0.1949 0.6589 
SGD 0.0000 0.9990 


NOTE: No (additional) variables met the 0.1 significance level for entry into the model. 


Summary of Stepwise Procedure 


a Score Wald Pr > 
Chi-square | Chi-Square Chi- 
Square 


298.9 


115.1 
50.0724 
78.2818 
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Classification Table 


Correct Incorrect Percentages 


Prob Non- Non- Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 


ed 











Correct Incorrect Percentages 


weweweeeenee coceccocencccsccsscocccecccecs. eeererensoressecceseo= 





Prob Non- Non- Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 
0.880 276 609 21 733 $4.0 27.4 96.7 7.1 54.6 
0.900 236 612 18 773 51.7 23.4 97.1 7.1 55.8 
0.920 203 619 11 = 806 $02 20.1 98.3 5.1 56.6 
0.940 174 625 5 835 48.7 172 99.2 2.8 57.2 








0.960 84 629 1 925 43.5 8.3 99.8 12 59.5 

0.980 47 629 1 962 412 47 99.8 2.1 60.5 

1.000 0 630 0 1009 38.4 0.0 100.0 . 616 
c. Model 3 


Criteria for Assessing Model Fit 


Intercept 
Intercept and 
Criterion Only Covariates Chi-Square for Covariates 
AIC 2958.381 2468.773 
SC 2964.258 2509.914 : 
2LOGL 2956.38] 2454.773 501.608 with 6 DF (p=0.0001) 
Score ; : 486.441 with 6 DF (p=0.0001) 


Analysis of Maximum Likelihood Estimates 


eee 
Variable Estimate Error Chi-Square | Chi-Square Ratio 
fevers [0 [ew [osm [oe [ons [Te 
[ssw | on | some | goo so | 
oasis | or | maze | oo | 
“18529 0.000 0st 
4460 see [ieee | 0.000 











Association of Predicted Probabilities and Observed Responses 


Concordant = 79.5% Somers’ D = 0.595 
Discordant = 20.1% Gamma = 0,597 
Tied = 0.4% Tau-a = 0.222 
(1298210 pairs) c = 0.797 


Residual Chi-Square = 0.2882 with | DF (p=0.5914) 


81 








Analysis of Variables Not in the Model 


Score Pr > 
Variable Chi-Square Chi-Square 


LGSVC 0.2882 0.5914 


Summary of Stepwise Procedure 


Score Wald 
Chi-square | Chi-Square i 
uare_ 


0.0534 


Classification Table 
Correct Incorrect Percentages 


Prob Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 





Correct Incorrect Percentages 
Prob Non- Non- Sensi- Speci- False False 
Level Event Event Event Event Correct tivity ficity POS NEG 


eee emee coroner cswcneces see rreece sense ete ewe se scene crew eccesccscow cc ssseseccseecewesooescasercs 


809 65.8 59.2 85.8 7.3 59.0 

915 626 53.8 89.3 6.2 61.0 
1048 58.6 47.1 93.4 44 63.1 
1175 54.3 40.7 95.3 3.7 65.3 
1389 46.8 29.9 97.9 2.3 68.4 
1524 42.1 23.1 99.5 0.7 70.0 
1704 35.3 14.0 99.8 0.4 72.3 
1846 30.0 6.9 100.0 0.0 73.8 
198] 24.9 0.1 100.0 0.0 75.2 


1.000 0 655 1982 24.8 0.0 100.0 . 75.2 
B. POLYTOMOUS RESPONSE MODEL 
1. Current Estimated Potential Model 
a. Predicted Probabilities and 95% Confidence Intervals 


/(LOGREG3 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
H/*MAIN § LINES=(99) 

if EXEC SAS 

/EXTFIN1 DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
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/EXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/EXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HSYSIN DD * 


OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFINI; 
INPUT 

@! 

@12 

@19 


NNr RK RP RK NNN S 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@! ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


’ 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@1 ID 
@8& P92 
@12 P9) 
@16 P90 
@20 P89 


oy 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 


IF (STATUS NE 1) THEN DELETE ; 
LGSVC = 92 - DOE; 

RSNR = 92 - DRANK; 

IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = I ; 








IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE (DROP = LEFT STATUS) TWO; SET OFFREC; 


IF (C92 LE 2) THEN CEP92 = I ; 
IF (C92 GT 2 AND C92 LE 4) THEN CEP92 = 2 ; 
IF (C92 GT 4 AND C92 LE 6) THEN CEP92 = 3 ; 
IF (C92 GT 6) THEN CEP92 = 4 ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 
ELSE OUTPUT TWO; 


TITLE ’STEPWISE LOGISTIC REGRESSION - POLYTOMOUS RESPONSE MODEL’; 
TITLE2 "CURRENT ESTIMATED POTENTIAL MODEL’ ; 
TITLE3 *1=CPT RANK 2=MAJ RANK 3=LTC RANK 4=COL AND ABOVE RANK’ ; 


PROC LOGISTIC DATA=ONE OUTEST=BETAS COVOUT ; 
MODEL CEP92 = EDU AWARD RANK LGSVC RSNR AGE SGD P9! C91 
/ SELECTION=STEPWISE 
SLE=0.1 
SLS=0.12 
DETAILS 


cy 


OUTPUT OUT=PRED P=PHAT LOWER=LCL UPPER=UCL ; 


PROC PRINT DATA=BETAS ; 
TITLE3 "PARAMETER ESTIMATES AND COVARIANCE MATRIX’ ; 


PROC PRINT DATA=PRED , 
TITLE3 "PREDICTED PROBABILITIES AND 95% CONFIDENCE LIMITS’ ; 


b. Verification Program 


/IVERIFY3 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
II*MAIN_ LINES=(99) 

i! EXEC SAS 

HEXTFIN) DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
EXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/TEXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HISYSIN DD * 


OPTIONS LS=80; 

DATA GENREC; 

INFILE EXTFINI; 

INPUT 
@1 ID 4. 
@12 DRANK 
@19 DOE 2. 
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LEFT 
STATUS 
EDU 
TRAWD 
RANK 
AGE 
SGD 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@1 ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@! ID 
@8 P92 
@12 P91 
@16 P90 


@20 P89 


’ 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 


IF (STATUS NE 1) THEN DELETE ; 


LGSVC = 92 - DOE; 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1 ; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE TWO, SET OFFREC; 


IF (C92 LE 2) THEN CEP92 = 1 ; 
IF (C92 GT 2 AND C92 LE 4) THEN CEP92 = 2; 
IF (C92 GT 4 AND C92 LE 6) THEN CEP92 = 3 ; 
IF (C92 GT 6) THEN CEP92 = 4 ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 
ELSE OUTPUT TWO; 





DATA THREE; SET ONE; 


-0.2548; 
4.1478; 
9.7599; 


BTX = EDU * (-0.1952) + RANK * (-2.2155) + RSNR * (-0.1560) 
+ AGE * (0.2379) + P91 * (-0.1374) + C91] * (-1.2298) ; 


NUMI = EXP(INTI+BTX) ; 
DENI! = (1+NUM1) ; 
GAMMAI = NUMI/DEN! ; 


NUM2 = EXP(INT2+BTX) ; 
DEN2 = (1+NUM2) ; 
GAMMA2 = NUM2/DEN2 ; 


NUM3 = EXP(INT3+BTX) ; 
DEN3 = (1+NUM3) ; 
GAMMA3 = NUM3/DEN3 ; 


P1 = GAMMAI ; 

P2 = GAMMA2 - GAMMAI ; 
P3 = GAMMA3 - GAMMA2 ; 
P4 = 1 - GAMMAS3 ; 


DATA FOUR (KEEP ID CEP92 GAMMA! GAMMA2 GAMMA3 PI P2 P3 P4); SET THREE; 
PROC PRINT; 


c. Cross-Validation of Model 


H{XVALID3 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
H*MAIN LINES=(99) 

/t EXEC SAS 

/TEXTFINI DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
/EXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/EXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HSYSIN DD * 


OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFIN]; 
INPUT 

@1 ID 

@12 

@19 

@36 

@43 

@45 

@s0 











@58 RANK 
@66 AGE 
@71 SGD 


’ 


NN=- 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@1 ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


NNNNS 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@! ID 
@8 P92 
@12 P91 
@16 P90 
@20 P89 


NNNNS 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 
IF (STATUS NE 1) THEN DELETE ; 
LGSVC = 92 - DOE; 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE TWO; SET OFFREC; 


IF (C92 LE 2) THEN CEP92 = 1 ; 


IF (C92 GT 2 AND C92 LE 4) THEN CEP92 = 2 ; 
IF (C92 GT 4 AND C92 LE 6) THEN CEP92 = 3 ; 


IF (C92 GT 6) THEN CEP92 = 4 ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 


ELSE OUTPUT TWO; 
DATA THREE; SET TWO; 


INTI = -0.2548; 








INT2 = 4.1478; 
INT3 = 9.7599; 


BTX = EDU * (-0.1952) + RANK * (-2.2155) + RSNR * (-0.1560) 
+ AGE * (0.2379) + P91 * (-0.1374) + C91 * (-1.2298) ; 


NUMI = EXP(INT1+BTX) ; 
DENI = (1+NUM)) ; 
GAMMAI = NUMI/DEN] ; 


NUM2 = EXP(INT2+BTX) ; 
DEN2 = (1+NUM2) ; 
GAMMA2 = NUM2/DEN2 ; 


NUM3 = EXP(INT3+BTX) ; 
DEN3 = (1+NUM3) ; 
GAMMA3 = NUM3/DEN3 ; 


P1 =GAMMAI ; 

P2 = GAMMA2 - GAMMAI ; 
P3 = GAMMA3 - GAMMA? ; 
P4 = | - GAMMA3 ; 


IF (P! EQ .) THEN GROUP =. ; 

ELSE IF (P! GT P2) AND (P1 GT P3) AND (PI GT P4) THEN GROUP = 1 ; 
ELSE IF (P2 GT P1) AND (P2 GT P3) AND (P2 GT P4) THEN GROUP = 2; 
ELSE IF (P3 GT P1) AND (P3 GT P2) AND (P3 GT P4) THEN GROUP = 3 ; 
ELSE GROUP = 4; 


IF (GROUP EQ .) THEN MATCH = ’MISSING’ ; 

ELSE IF CEP92 EQ GROUP THEN MATCH = ’CORRECT’ ; 

ELSE MATCH = ’WRONG’ ; 

DATA FOUR (KEEP ID CEP92 P1 P2 P3 P4 GROUP MATCH); SET THREE; 
PROC PRINT; 


TITLE °ONE WAY FREQUENCY TABLE’ ; 
PROC FREQ ; 

TABLES MATCH ; 
RUN; 

2. PERFORMANCE MODEL 


a. Predicted Probabilities and 95% Confidence Intervals 


IMLOGREG4 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
H*MAIN_ LINES=(99) 

i EXEC SAS 

/EXTFINI DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
/EXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/fEXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 








HSYSIN DD * 


OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFIN1; 
INPUT 
@1 ID 
@12 DRANK 
@19 DOE 
@36 LEFT 
@43 STATUS 
@45 EDU 
@50 TRAWD 
@58 RANK 
@66 AGE 
@71 SGD 


NNR KK NNN S 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@1 ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


> 


NNNNS 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@i ID 
@8 P92 
@12 P91 
@16 P90 
@20 P89 


’ 


NNNNA 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 


IF (STATUS NE 1!) THEN DELETE ; 
LGSVC = 92 - DOE; 

RSNR = 92 - DRANK; 

IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1 ; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 
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DATA ONE (DROP = LEFT STATUS) TWO; SET OFFREC; 


IF (P92 LE 3) THEN PERF92 = 1 ; 

IF (P92 GT 3 AND P92 LE 6) THEN PERF92 = 2 ; 

IF (P92 GT 6 AND P92 LE 9) THEN PERF92 = 3 ; 

IF (P92 GT 9 AND P92 LE 12) THEN PERF92 = 4 ; 
IF (P92 GT 12) THEN PERF92 = 5 ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 
ELSE OUTPUT TWO; 


TITLE ’STEPWISE LOGISTIC REGRESSION - POLYTOMOUS RESPONSE MODEL’; 
TITLE2 "PERFORMANCE CLASSIFICATION MODEL’ ; 
TITLE3 "1=E GRADE 2=D GRADE 3=C GRADE 4=B GRADE 5=A GRADE’ ; 


PROC LOGISTIC DATA=CNE OUTEST=BETAS COVOUT ; 
MODEL PERF92 = EDU AWARD RANK LGSVC RSNR AGE SGD P91 C91 
/ SELECTION=STEPWISE 
SLE=0.1 
SLS=0.12 
DETAILS 


’ 


OUTPUT OUT=PRED P=PHAT LOWER=LCL UPPER=UCL ; 


PROC PRINT DATA=BETAS ; 
TITLE3 "PARAMETER ESTIMATES AND COVARIANCE MATRIX’ ; 


PROC PRINT DATA=PRED ; 
TITLE3 "PREDICTED PROBABILITIES AND 95% CONFIDENCE LIMITS’ ; 


b. Verification Program 


IVERIFY4 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
H*MAIN  LINES=(99) 

i! EXEC SAS 

/EXTFINI DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
/HEXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/TEXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HSYSIN DD * 


OPTIONS LS=80; 
DATA GENREC; 
INFILE EXTFINI; 
INPUT 
@1 ID 
@!2 DRANK 
@19 DOE 
@36 LEFT 
@43 STATUS 


-NNDNS 








@45 
@50 
@58 
@66 
@7) 


, 


EDU 
TRAWD 
RANK 
AGE 
SGD 


DATA CEPREC; 
INFILE EXTFIN2; 


INPUT 
@l 


ID 


@6 C92 
@10 C91 


@14 
@18 


, 


C90 
C89 


DATA PERFREC; 
INFILE EXTFIN3; 


INPUT 
@l 


ID 


@8 P92 


@12 
@16 
@20 


> 


P91 
P90 
P89 


DATA OFFREC; 


MERGE GENREC CEPREC PERFREC; 


BY ID; 


IF (STATUS NE 1) THEN DELETE ; 


LGSVC = 92 - DOE; 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE TWO; SET OFFREC; 


IF (P92 LE 3) THEN PERF92 = 1 ; 

IF (P92 GT 3 AND P92 LE 6) THEN PERF92 = 2 ; 
IF (P92 GT 6 AND P92 LE 9) THEN PERF92 = 3 ; 
IF (P92 GT 9 AND P92 LE 12) THEN PERF92 = 4 ; 
IF (P92 GT 12) THEN PERF92 = 5 ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 
ELSE OUTPUT TWO; 


NN = we 


NNNN S 


NNNNS 


DATA THREE; SET ONE; 


INT1 = 1.2353; 
INT2 = 2.0194; 
INT3 = 5.7756; 
INT4 = 9.8619; 
BTX = RANK*(0.8447) + RSNR*(-0.2118) + P91%(-0.3637) + C91*(-0.5939) ; 


NUMI = EXP(INTI+BTX) ; 
DEN]! = (1+NUM}) : 
GAMMAI = NUMI/DEN! ; 


NUM2 = EXP(INT2+BTX) ; 
DEN2 = (1+NUM2) ; 
GAMMA2 = NUM2/DEN2 ; 


NUM3 = EXP(INT3+BTX) ; 
DEN3 = (1+NUM3) ; 
GAMMA3 = NUM3/DEN3 ; 


NUM4 = EXP(INT4+BTX) ; 
DEN4 = (1+NUM4) ; 
GAMMA4 = NUM4/DEN4 ; 


Pl = GAMMAI ; 

P2 = GAMMA2 - GAMMAI ; 
P3 = GAMMA3 - GAMMA2 ; 
P4 = GAMMA4 - GAMMA3 ,; 
P5 = 1 - GAMMA4 ,; 


DATA FOUR (KEEP ID PERF92 GAMMAI GAMMA2 GAMMA3 GAMMA4 PI P2 P3 P4 PS), 
SET THREE; 
PROC PRINT; 


C Cross-Validation of Model 


‘IXVALID4 JOB CLASS=A,USER=S6599,PASSWORD=LEE 
/I*MAIN = LINES=(99) 

// EXEC SAS 

/TEXTFINI DD DISP=SHR,DSN=MSS.S6599.GEN.DATA 
/EXTFIN2 DD DISP=SHR,DSN=MSS.S6599.CEP.DATA 
/YEXTFIN3 DD DISP=SHR,DSN=MSS.S6599.PERF.DATA 
HSYSIN DD * 


OPTIONS LS=80; 

DATA GENREC; 

INFILE EXTFIN1; 

INPUT 
@1 ID 4. 
@12 DRANK 2. 
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@19 DOE 
@36 LEFT 
@43 STATUS 
@45 EDU 
@50 TRAWD 
@58 RANK 
@66 AGE 
@71 SGD 


NN — ww em Nh 


DATA CEPREC; 
INFILE EXTFIN2; 
INPUT 
@! ID 
@6 C92 
@10 C91 
@14 C90 
@18 C89 


NNNN SA 


DATA PERFREC; 
INFILE EXTFIN3; 
INPUT 
@1 ID 
@8 P92 
@12 P91 
@16 P90 


@20 P89 


DATA OFFREC; 
MERGE GENREC CEPREC PERFREC; 
BY ID; 


NNNN|E 


IF (STATUS NE 1) THEN DELETE ; 


LGSVC = 92 - DOE, 
RSNR = 92 - DRANK; 
IF (RSNR EQ 92) THEN RSNR =. ; 


IF (TRAWD EQ 0) THEN AWARD = 1 ; 
IF (TRAWD NE 0) THEN AWARD = 2 ; 


DATA ONE TWO; SET OFFREC; 


IF (P92 LE 3) THEN PERF92 = 1 ; 

IF (P92 GT 3 AND P92 LE 6) THEN PERF92 = 2 ; 
IF (P92 GT 6 AND P92 LE 9) THEN PERF92 = 3 ; 
IF (P92 GT 9 AND P92 LE 12) THEN PERF92 = 4 ; 
IF (P92 GT 12) THEN PERF92 = S ; 


IF RANUNI(12345678) LT 0.45 THEN OUTPUT ONE; 
ELSE OUTPUT TWO; 


94 








DATA THREE; SET TWO; 


1.2353; 
2.0194; 
INT3 = 5.7756; 
INT4 = 9.8619; 
BTX = RANK*(0.8447) + RSNR*(-0.2118) + P91*(-0.3637) + C91*(-0.5939) , 


2 
S| 
"ou 


NUMI = EXP(INTI+BTX) ; 
DEN] = (1+NUM1) ; 
GAMMA] = NUMI/DENI ; 


NUM2 = EXP(INT2+BTX) ; 
DEN2 = (1+NUM2) ; 
G’MMA2 = NUM2/DEN2 ; 


NUM3 = EXP(INT3+BTX) ; 
DEN3 = (1+NUM3) ; 
GAMMA3 = NUM3/DEN3 ; 


NUM4 = EXP(INT4+BTX) ; 
DEN4 = (1+NUM4) ; 
GAMMA4 = NUM4/DENG ; 


Pl = GAMMAI ; 

P2 = GAMMA2 - GAMMAI , 
P3 = GAMMA3 - GAMMA2 ; 
P4 = GAMMA4 - GAMMA3 ; 
PS = 1 - GAMMA4 ; 


IF (P! EQ .) THEN GROUP =. ; 

ELSE IF (P1 GT P2) AND (PI GT P3) AND (P! GT P4) AND (P! GT P5) 
THEN GROUP = 1; 

ELSE IF (P2 GT P1) AND (P2 GT P3) AND (P2 GT P4) AND (P2 GT 1’5) 
THEN GROUP = 2; 

ELSE IF (P3 GT PI) AND (P3 GT P2) AND (P3 GT P4) AND (P3 GT P5) 
THEN GROUP = 3 ; 

ELSE IF (P4 GT P1) AND (P4 GT P2) AND (P4 GT P3) AND (P4 GT P5) 
THEN GROUP = 4 ; 

ELSE GROUP = 5; 


IF (GROUP EQ .) THEN MATCH = ’MISSING’ ; 

ELSE IF PERF92 EQ GROUP THEN MATCH = ’CORRECT’ ; 

ELSE MATCH = ’WRONGQ’ ; 

DATA FOUR (KEEP ID PERF92 PI P2 P3 P4 PS GROUP MATCH); SET THREE; 
PROC PRINT; 


TITLE "ONE WAY FREQUENCY TABLE’ ; 
PROC FREQ ; 

TABLES MATCH ; 
RUN; 
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